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_INFRASTRUCTURE LOG 


_DAY 53: We’re flooded with information. Data. E-mails. 
Web content. Video. It’s trapped in unconnected systems. 
It’s practically inaccessible. We need to do something. 


_Gil needs help finding the right info, but I specifically 
listed “fear of heights” as a weakness during my last review. 


_Day 54: The answer: IBM solutions for leveraging information. 
They can help us build a high-performance infrastructure to 
bring info together, up and down the stack. IBM middleware 
consolidates critical structured and unstructured info 
across the silos for a single, unified view. IBM servers 
and storage give us virtualization for improved utilization. 


_Now we can make better decisions with our info. I feel so 
much more grounded now. 
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Number of extra outdoor hours you gained 
this week thanks to DBArtisan. 


tool, all platforms. 
So many databases, so little time. Embarcadero 
DBArtisan is the one powerful yet simple cross- 
plattorm solution for managing all your databases 


from a single console. 


See how one gets it done for Oracle, SOL Server, 
DB2, Sybase and MySQL. 


Visit www.embarcadero.com/dbartisan/ 


to download a free trial. 
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— Center 2007 


-John Savill 

The new System Center products offer 
plenty of SQL Server management help, such as as 
more-detailed alerting, configuration management, 
and SQL Server-specific data protection. 
InstantDoc Iih 96071] 


A Buik Approach to 
Business Age Calculation 


-Marina Davydova 
A single SELECT command is all you need to calculate 
the business age between two dates in a database. 
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Protect UDM with Dimension Data Security 


-Teo Lachev 
Leverage Analysis Services’ role-based security model to restrict access to 
dimension members and the data associated with them. 


InstantDoc I[p 95998 
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KEVIN KLINE ie cents [7| INSIDE SQL SERVER ... [27| 
CigarTrace Help for Query Hints 

—Kevin Kline —Kalen Delaney 


Overwhelmed by all the data from those server-side 
traces or SQL Server Profiler sessions? This free 
graphical summary tool can help you make sense of it 


all and improve your system performance. 
InstantDoc II} 9613 


Plan guides let you instruct the optimizer about 
when to use a particular hint to boost a query's 


performance—but you should use them sparingly. 
InstantDoc IO 96134 


T-SQL BLACK BELT ..... SOLUTIONS BY DESIGN. | 29] 
Identifying Sections Discover the Star Schema 
—ltzik Ben-Gan —Michelle A. Poolet 


The star schema design will enable you to create 
the foundational data warehouse for your company’s 
business intelligence solution. 


InstantDoc If} 96112 


Need to identify sections of consecutive rows that 
share the same value? We demonstrate two techniques 
to solve the problem—one based on a subquery, and 
the other based on row numbers. 


InstantDoc I} 95912 
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Wait Before You Consolidate 
—Michael Ote 
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MARKETPLACE ........ AG 
SELECT TOP(X) ........ [43] 


System Center Data Protection Manager 
2007 

—Michael Otey 

InstantDoc IDÍ 95981] 
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products 


COMPARATIVE REVIEW . [37] 


Must-Have XML Tools 

—William Sheldon 

XML tools from Altova and DataDirect Technologies 
both excel at XML editing and XML Web services 


support, though each does so in different ways. 
InstantDoc if SCE] 


MARKET WATCH 


The Move to Multicore 

—Michael Otey 

Multicore processors improve your systems by bringing 
better database performance and scalability without 


big increases in price or power requirements. 
InstantDoc if oo] 
NEW PRODUCTS ....... [43 


Check out new and improved SQL Server—related 
products. 


InstantDoc If) 96189 


INDUSTRY BYTES ..... [45 


Dawn Cyr shares her insights from conversations 
Interactive Edge and Neverfail Group. 


on the WEB 


READER CHALLENGE 
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Altova® MapForce® 2007 — The premier data integration and Web services implementation tool. 


ALTOVIC 
mapforces 


Give your data 
direction 


Link up with MapForce® 2007, 
and exchange data with ease. 


Spied in MapForce 2007 Release 3: 
e Enhanced database integration capabilities 


e New database query window to analyze 
SQL data directly 


e Support for database XML fields in data 
mapping projects (IBM DB2) 


e Extended data filtering functionality 


Altova MapForce 2007, the award-winning data 
integration and Web services implementation tool, 
makes it easy to exchange data between XML, 
database, flat file, EDI, and/or Web services formats 
and to map data to WSDL operations. Simply drag 
¥ 2 "connecting lines from data sources to targets and drop 
Cea in data-processing functions. MapForce converts data 
on-the-fly or auto-generates program code in XSLT 1.0/2.0, 
XQuery, Java, C++, or C# for royalty free use in your data 
integration and Web services applications. Get connected! 
=Download MapForce® 2007 today: ee 
(= 
MapForce is also available as part of the value- 
packed Altova MissionKit software bundle-) 
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‘Ole of the most prevalent trends in IT 
today is server consolidation. Compa- 
nies consolidate mainly to end server sprawl, 
reduce total cost of ownership (TCO), 
increase hardware utilization, and streamline 
IT resources. But many organizations aren’t 
seeing these benefits. Their different depart- 
ments and branch offices continually acquire 
software solutions, which tend to come 
bundled with server hardware. As these 
single-purpose servers accumulate, they 
increase the organizations’ operating costs 
both in terms of direct hardware purchases 
and maintenance. In addition to costs, these 
accumulated servers also increase infrastruc- 
ture requirements with their power, cooling, 
and networking demands. Not surpris- 
ingly—considering the massive increases in 
the computing power of today’s multicore 
servers—these single-purpose systems tend 
to have utilization rates under 20 percent. 
More systems mean greater management 
efforts and reduced efficiency and flex- 
ibility. One well-known management adage 
sums up this situation: “Every penny spent 
on hardware requires a dollar to manage.” 
Whether your organization will benefit 
from consolidation depends on the type of 
server system you have. Before you jump on 
the server-consolidation bandwagon, make 
sure your system 1s a good candidate for this 
popular solution. 

By definition, server consolidation uti- 
lizes a shared hardware environment, and 
excessive system requirements by one or 
more of the servers involved will reduce 
the responsiveness of all of the servers on 
the platform. Workloads that are best suited 
for server consolidation have low levels of 
both CPU and disk resource utilization with 
occasional spikes of activity where the utili- 
zation rate could approach 100 percent for 
short periods of time. Combining several of 
these low-level workloads can be advanta- 
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Consolidate 


Editorial 


geous, because they don’t tend to overload 
the target server. When combined together, 
they can increase server-utilization rates 
while still maintaining acceptable service 
levels. 

Relational database servers like SQL 
Server don’t always fit into this mold. SQL 
Server systems often support multiple data- 
bases and must concurrently process mul- 
tiple complex queries—requiring sustained 
high levels of CPU and memory utilization 
as well as high I/O requirements. In addi- 
tion, because many of the applications sup- 
ported by SQL Server databases are mission 
critical, the system must also provide high 
levels of responsiveness, which isn’t well 
suited for server consolidation. 

While it’s true that many SQL Server 
installations have the high-system require- 
ments that would cause them to play poorly 
in a server-consolidation environment, not 
all SQL Server systems are like that. Many 
SQL Server systems, especially small depart- 
mental and branch-office database servers, 
have modest requirements. 

In the end, it all boils down to the 
characteristics of your system’s workload. 
Workloads with sustained rates of high 
resource utilization either CPU or I/O 
arent good candidates for consolidation. 
There are many products available like 
Microsoft Operations Manager (MOM), 
other third-party performance tools, and 
even just plain, old Performance Monitor 
that help you understand your SQL Server 
system’s workload characteristics. Once you 
learn your system’s workload characteristics, 
you'll know if server consolidation is right 
for your organization. SOL 

InstantDoc ID 96117 


Michael OkeY (inikeo@windowsitpro.com) is technical 
director for Windows IT Pro and SQL Server Magazine and coau- 
thor of SQL Server 2005 Developer's Guide (Osborne/McGraw-Hill). 
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Included in ApexSQL Studio: 


LE ApexSaL ApexSQL Studio 


i viaas your One-Stop-Shop 
for SQL Server 


Developer Tools 


ApexSQL Studio 


the essential toolkit for SQL Server 
Developers and Administrators 


ApexSQL Studio offers a powerful suite of tools that includes: 


v SQL Server 2005 support 

y Command Line Interfaces 

V Free 12 mos. Support & Upgrades 

V All New products released for next 12 mos Free 


V Annual Subscription renewal of only $400 


ApexSQL Audit Active data auditing and reporting 
ApexSQL Clean Risk free delete and dependency analysis 
ApexSQL Code Template based Code Generation 


Spy naar pedeeen 
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ApexSQL Diff Database comparison and synchronization © þes er 


= a dee bee 


ApexSQL Doc Database Documentation into HTML and CHM ApexSQL Diff i= 
ApexSQL Edit Full Featured Editor/IDE for SQL Server = ee 


ApexSQL Log Database Auditing and Recovery = o- nn een 
ApexSQL Report Data Driven web-based Reporting z: SSG] i eu jE 


ApexSQL Script Database scripting, packaging and deployment E 
p p pting, packaging ploy ApexSQL Edit 


For more information 
or to download a free trial version 


Competitive 
Upgrades of 


contact: up to 80% 


Zs fe ApexSaL 
AA N 


or phone 866-665-5500 software 


www.sqimag.com 


Editor's Note: Post your feedback and tool 
recommendations on the Tool Time forum at 
http://www.sqimag.com/go/tooltime. 
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Intermediate 


ClearTrace 


A clear look into trace/Profiler data 


Ez have trouble making sense ofall the data that a server-side trace or a SQL Server 
Profiler session produces? If so, check out ClearData Consulting’ free Clear Trace 
tool. Clear Trace is a summary and graphical display tool for SQL Server 2005 and 2000 
trace and Profiler files. In his work as an independent consultant, SQL Server MVP 
Bill Graziano, who also runs the_http://www.sqlteam.com Web site, wanted to create a 
utility similar to the Microsoft SQL Server Product Support teams’ Read80Trace com- 
mand-line utility for processing SQL Server 2000 trace files but one that would run 


on SQL Server 2005 and would display its output graphi- 
e on the WEB 
See the Web figure at 
InstantDoc ID 96133 


cally instead of as replay markup language (RML) files or a 
Functionality 


ClearTrace cleanly summarizes the query performance data that traces and Profiler col- 
lect and improves the performance-tuning process by making the assessment of SQL 
Server query performance easier and less arcane. 

One of ClearTrace’s most important features is that it “normalizes” SQL statements, 
eliminating variables and transient differences so that you can look at the impact of a 
certain class of statements. Normalization lets you know which statements are essentially 
identical, except for the parameters. The goal is to identify the statements that consume 
the most resources in aggregate. A statement that runs once and issues 100,000 reads, for 
example, isn’t as bad as one that runs 100,000 times but issues 1,000 reads each time. 

ClearTrace also performs the following operations so that it can group related types 
of SQL statements together and show you their impact as a category on SQL Server 
performance: 

e Converts all constants (numeric, string, and date) to placeholders 

e Renders prepared SQL and dynamic SQL code created by using the sp_ 
executesql stored procedure as the actual statement that SQL Server executes 

e Prefixes server-side cursors with "{ CURSOR }" for easy identification 

e Pulls stored procedures from the RPC:Completed trace event and displays their 
names 


normalized database. ClearTrace is his solution for quickly 
getting valuable information from server-side trace files. 


ClearTrace’s second-most important feature is how it displays results. The product 
includes a simple query tool to graphically display the trace or Profiler performance 
data and groups the results by SQL text, application, host, and login. You can filter by 
application, host, or login values, and you can sort the result sets further by CPU, reads, 


writes, or duration of operation, as the Results tab in Web Figure 1 (http://www 


sqlmag.com, InstantDoc ID 96133) shows. 

ClearTrace also sequentially processes all trace files from a trace, making it easy to 
work with large trace sessions. It stores all the data it collects in a SQL Server database 
for later reporting and performance assessment. And it can automatically move trace 
files into an archive directory after it has processed them. 
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CLEARTRACE 


BENEFITS: Summarizes data from server- 
side traces and SQL Server Profiler sessions; 
normalizes SQL statements into common 
categories for group analysis; displays 
results graphically. 


SYSTEM REQUIREMENTS: Must be 
installed on a computer running SQL Server 
2005; can store the source trace files and 
repository performance data in either SQL 
Server 2005 or SQL Server 2000, so you can 
use it to monitor both releases. 


HOW TO GET IT: Download ClearTrace for 
free at http://www.cleardata.biz/cleartrace/ 


download.aspx. 


System Requirements 

ClearTrace must run on a computer that 
has SQL Server 2005 Developer, Work- 
group, Standard, or Enterprise Edition 
installed. If you're using SQL Server 2005 
Express Edition, you need to manually 
install the required SQL Server 2005 Man- 
agement Objects (SMO) libraries. (You 
can download these libraries at http:// 
www.microsoft.com/downloads/details 


.aspx?FamilyID=df0ba5aa-b4bd-4705- 


aaQa-b477ba72a9cb&DisplayLang=en. 
Note: The SMO Feature Pack doesn’t 


always install all the necessary features, so 
you might need to install SMO from the 

SQL Server product installation CD.) 
You can download ClearTrace and 
find instructions, sample screens, and a 
video demo of the product at http:// 
www.cleardata.biz/cleartrace/download 
„aspx. Give us your feedback about this 
‘and other products on the Tool Time 
forum at http://www.sqlmag.com/go/ 
tooltime. SOL 
InstantDoc ID 96133 


Kevin Kline (kevin.kline@quest.com) is the director 
of technology for SQL Server Solutions at Quest Software, 
president of the international Professional Association for SQL 
Server (PASS), and the author of SQL in a Nutshell, 2nd edition 
(O'Reilly Media). 
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Put your spreadsheets 
In perspective! 


Visual rule-based dimensionalization of complex 
Excel® and CSV spreadsheets 
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Introducing DataDefractor™ - a Microsoft® SQL Server 2005 Integration Services (SSIS) 
source component designed to extract data from semi-structured data sources such as 
Excel and CSV reports and spreadsheets, dimensionalize it and feed it directly into the 
SSIS pipeline - with no programming required! 


® Example driven user interface ® Flexible data mapping rules 

® Nested multi-paged data sources ® Visual dimensional modeling 

® Multi-data source loading ® Regular expression-based extractions 
® Automatic header/footer discovery ® 64-bit platform support 


Download a FREE fully functional 14 day trial at www.datadefractor.com. 


3) DataDefractor 


Unlocking the value of your data 


©2007 Interactive Edge LLC. All rights reserved. DataDefractor is a trademark of Interactive Edge. Microsoft and Excel are either registered trademarks or trademarks 
of Microsoft Corporation in the United States and/or other countries. 
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Have SQL Server Email You 
Error Messages Generated by 
Job Failures 

When a SQL Server job fails, you can have 
SQL Server Agent send you a notification. 
However, as Figure 1 shows, the notification 
doesn’t include the error message generated 
by that job failure. So, you have to connect 
to SQL Server to read the error message 
to determine whether the failure is being 
caused by a critical problem. 

Because it’s helpful to receive the error 
messages generated by failed jobs, I created a 
SQL Server job step that calls a stored pro- 
cedure named spDBA_job_notification. The 
job step passes the failed job’s ID to the stored 
procedure.The stored procedure uses the job 
ID to query the msdb agent tables for the 
most recent error message for that job. The 
stored procedure incorporates the error mes- 
sage into an email and sends the email to the 
specified person. Figure 2 shows an example 
of an email sent by the stored procedure. 

To pass the job ID to the stored pro- 
cedure from within the job, I use a SQL 
Server Agent token. SQL Server Agent lets 
you use tokens in T-SQL job step scripts. 
In SQL Server, various tokens represent 
job elements. For example, the SRVR 
token represents the server running SQL 


A Clever Way to Connect to a Hidden 
Named Instance 

We were experiencing problems connecting to a hidden 
SQL Server 2005 named instance. The only way we could 
successfully connect to the hidden named instance was 
through an alias on a client machine. 

When I searched the Internet for a solution, I noticed 
that all the material I read said the connection strings had to 
be in the format ServerName \InstanceName,PortNumber 
(e.g., ProdServer\Reports,3334). This got me thinking. 
When this format is used, the connection string doesnt work with hidden named 
instances because the string is identifying the named instance. I wondered what would 
happen if the named instance wasn’t part of the string, so I removed it. When I tried the 
revised connection string, which followed the format ServerName,PortNumber (e.g., 
ProdServer,3334), I was able to connect to the SQL Server 2005 named instance. 

I’ve found that the shortened connection string also works with hidden SQL Server 
2000 named instances, many tools (including Enterprise Manager), and ODBC connection 
strings. You can even apply the concept to Java Database Connectivity (JDBC) connection 
strings. For example, our JDBC driver uses a connection string that follows the format 
ServerName:InstanceName:PortNumber (e.g., ProdServer:Reports:3334).When I used 
only the server name and port number in the format ServerName:PortNumber (e.g., 
ProdServer:3334), I was able to connect to the hidden SQL Server 2005 named instance. 

The shortened connection string works on any instance hosted on any server. I even 
made it a personal best practice to use the port number, even for the default port (1433), 
so I can better distinguish one instance from another. However, the shortened connec- 
tion string only partially works with the dedicated administrator connection. So far, I’ve 
only been able to connect to an instance with the dedicated administrator connection 
though SQL Profiler and an ODBC connection string but not through SQL Server 
Management Studio (SSMS). 


Gilles Despaties 


—Gilles Despaties, Senior Database Administrator, 


DURATION:@ hours, Ø minutes, Ø seconds 
STATUS: Failed 


JOB RUN: 'Test_Database_Mail' was run on 1/25/2007 at 3:32:08 PM 


MESSAGES:The job failed. The Job was invoked by User <domain>\<username>. 
The Last step to run was step 3 (Report Failure Detailed). 
requested to start at step 1 (Wait for 13 min). 


House of Commons of Canada 
InstantDoc ID 96058 


The job was 


Server, the A-DBN token represents the 
name of the database running the job, 


> > FIGURE | Sample notification sent by the SQL Server Agent 


and the JOBID token represents the job 
ID. (For a complete list of tokens, see the 


From: <servername>dtest.com 
Sent: Thursday, January 25, 2007 2:23 PM 
To: Ahmed, Jameel 


Job_name = Test_Database_Mail 


Step name= Simple select 

DB Name = master 

Run Date = Jan 25 2007 5:22PM 
Severity = 15 

Error = 


server'. CSQLSTATE 42822] (Error 207) 


Subject: Test_Database_Mail FAILED on \\<servername> 


Executed as user: <domain>\<username>. Invalid column name ‘originatin_ 
Invalid column name ‘'descriptio'. 
CSQLSTATE 42822] (Error 207). The step failed. 


Command = select job_id, originatin_server, name, descriptio from msdb..sysjobs 


“Using Tokens in Job Steps” section in SQL 
Server 2005 Books Online—BOL, which 
you can access at http://msdn2.microsoft 


dew on the WEB 
Download the code at InstantDoc } 


IDs 96056 and 96059 


Share Your Experiences 
Share your SQL Server code, comments, discoveries, and solutions to 
problems. Email your contributions to r2r@sqlmag.com. Please include 


> > FIGURE 2 Sample email sent by the stored procedure 
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your full name and phone number. We edit submissions for style, 
grammar, and length. If we print your submission, you'll get $100. 
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> > FIGURE 3 Setting up the job steps 


.com/en-us/library/ms175575.aspx.) When 
you insert a token in a job step script, SQL 
Server Agent replaces the token with the 
element it represents at run time. 

You set up job steps in SQL Server 
Management Studio (SSMS). Open SSMS, 
expand SQL Server Agent (it must be run- 
ning), then expand Jobs. Right-click the 
target job and select Properties. In the Job 
Properties dialog box, select Steps from the 
Select a page menu on the left. Figure 3 shows 
an example of the Job step list section that 
appears. 

In Figure 3, note that there are two steps. 
In step 1, the job is run. In this case, the job 
is a T-SQL script named Simple select. (You 
can download SQLJob_Create.sql, which 


© Job Step Properties - Failure Neticalion Detailed 
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> > FIGURE 4 Setting up step 2 
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will create this test job, from the SQL Server 
Magazine Web site.) When the Simple select 
script runs without any problems, a comple- 
tion status of success is logged. When the 
Simple select script encounters a problem 
that causes it to fail, step 2 executes. 

Figure 4 shows an example of the settings 
for step 2. Because the spDBA_job_notifica- 
tion stored procedure resides in the master 
database, the Database dropdown list is set to 
master. If you're running SQL Server 2005 
SP1 or later, the command for step 2 is 


EXEC spDBA_job_notification 
$CESCAPE_NONECJOBID)) 


If you're running a version earlier than SQL 
Server 2005 SP1, the command is 


EXEC spDBA_job_ 
notification 
CJOBIDI 


This command calls the 
spDBA_job_notifica- 
tion stored procedure. 
The SQL Server Agent 
engine replaces the 
$(ESCAPE_NONE 
(JOBID)) or JOBID] 
token with the job ID 
before executing the 
stored procedure. You 
can download spDBA_ 
job_notification.sql 
from the SQL Server 
Magazine Web site. 
To use this script, you 
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need to have Database Mail set up. If it isn’t 
set up, you can do so with DBMail_Setup 
sql, which you can also download from the 
SQL Server Magazine Web site. 

If you were to manually supply the job 
ID and execute spDBA_job_notification, 
the stored procedure wouldn’t work. The 
stored procedure must be called from within 
a job step because it checks for the most 
recent job history without a completion 
status. When you run it in SSMS, the job is 
already completed and has a status of either 
success or failure. 

If someone were to make changes to 
the stored procedure that would cause it to 
fail or if someone were to comment out 
the email section in the code, you wouldn't 
receive any notifications. For this reason, I 
highly recommend you also set up the job’s 
Notifications page as a backup measure so 
that you get notified of job failures. You can 
access the Notifications page from the Job 
Properties dialog box. 

—Jameel Ahmed, Database Administrator/ 
Analyst, Canaccord Capital Corp. 
InstantDoc ID 96056 


Queries Dragging? Try 
Defragging 

Did you ever have a user tell you a query is 
taking a lot longer to compete than before, 
even though nothing in it has changed? If 
so, there’s a good chance that the indexes 
in the table that the query ran against have 
become fragmented. Fixing this problem is 
a two-step process. First, you need to first 
determine which indexes have become 
fragmented. Second, you need to defrag 
those indexes. I wrote a stored procedure, 
cspDefragIndexes, that automatically per- 
forms both steps. You can use cspDefragIn- 
dexes to analyze all the indexes in a single 
table or a whole database to determine 
whether they're fragmented. You can also 
use cspDefragIndexes to defrag that table or 
database. The stored procedure even updates 
all the statistics. 

You can download the cspDefragIndexes 
stored procedure from the SQL Server 
Magazine Web site. To run it, you need to 
provide two parameters. The first parameter 
is the table name. Or, you can specify 'ALL' 
to work with all the tables in the database. 
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Database id = 8 
TABLE | Percentage of Fragmentation Before the Defrag Operation Defrag indexes for table Customer 

Starting process for 14 indexes 

2 y Executed ALTER INDEX PK_Customer ON 

TableName IndexName Indexld Part FragPct Fill IndexType Unique PKindex dbo.Customer REBUILD 

Executed ALTER INDEX FK_CustCSPID ON 
Customer PK_Customer il 1 81.71806 0 CLUSTERED Y Y dbo.Customer REORG 

Executed ALTER INDEX IX_CustNo ON dbo. 
Customer FK_CustCSPID 2 1 20 0 ONCLUSTERED Customer REBUILD 

Executed ALTER INDEX IX_LastName ON 
Customer IX_CustNo 3 1 76 0 ONCLUSTERED dbo.Customer REBUILD 

Executed ALTER INDEX IX_FirstName ON 
Customer IX_LastName 4 1 90.69767 0 ONCLUSTERED dbo.Customer REBUILD 

Executed ALTER INDEX IX_Customer_DBA 
Customer IX_FirstName 5 1 58.33333 |0 ONCLUSTERED ON dbo.Customer REBUILD 

Executed ALTER INDEX FK_CustAcctsRecID 
Customer IX_Customer_DBA 6 1 88.88888 0 ONCLUSTERED. ON dbo.Customer REBUILD 

Executed ALTER INDEX IX_Cust_RateID ON 
Customer FK_CustAcctsReclD 7 1 41.66666 0 ONCLUSTERED dbo.Customer REBUILD 

Executed ALTER INDEX FK_CustSiteAddrID 
Customer IX_Cust_RatelD 8 1 89.47368 0 ONCLUSTERED ON dbo.Customer REBUILD 

Executed ALTER INDEX FK_CustMailAddrID 
Customer FK_CustSiteAddrlD 9 1 41.66666 0 ONCLUSTERED ON dbo.Customer REBUILD 

Executed ALTER INDEX IX_Customer_ 
Customer FK_CustMailAddrlD 10 1 41.66666 0 ONCLUSTERED CorrAddrID ON dbo.Customer REBUILD 

Executed ALTER INDEX FK_PrevCustID ON 
Customer IX_Customer_CorrAddrlD 11 1 41.66666 0 ONCLUSTERED dbo.Customer REBUILD 

Bypassed fragmentation for FK_ 
Customer FK_PrevCustID 12 1 41.66666 0 ONCLUSTERED. MasterCustID 

Executed ALTER INDEX IX_Customer ON 
Customer FK_MasterCustID 13 1 4.66666 0 ONCLUSTERED dbo.Customer REBUILD 

update statistics Customer 
Customer IX_Customer 14 1 76 0 ONCLUSTERED |Y 


> > FIGURE 5 The stored procedure’s 


online report notes the action taken for 


base. The second parameter tells the stored 
procedure to either display the indexes and 
their percentage of fragmentation (specify 
'N') or defrag the indexes (specify 'Y'). 

For example, if you want to check the 
Customer table to see how badly its indexes 
are fragmented, you use the command 


cspDefragIndexes 'Customer', 'N' 


Table 1 shows sample results. As you can 
see, most of the indexes are highly frag- 
mented—even the clustered index is more 
than 80 percent fragmented. This table’s 
indexes need to be defragged, so you run 
the command 
cspDefragIndexes 'Customer', 'Y' 
The csp_defragIndexes stored procedure 
rebuilds indexes whose fragmentation is 30 
percent or higher, reorganizes indexes whose 
fragmentation is between 29 percent and 5 
percent, and bypasses indexes whose frag- 
mentation is less than 5 percent. An update 
of the statistics completes the process. Figure 
5 shows the report that csp_defragIndexes 
displays on screen. As you can see, the report 
specifies the action taken for each index. 

A quick rerun of csp_defragIndexes 
in display mode shows the improvements 
made by the defrag operation. As Table 


2 shows, the percent of fragmentation is 
significantly less, which means the queries 
against the Customer table will run signifi- 
cantly faster. 

I wrote csp_defragIndexes for use on the 
SQL Server 2005 Standard Edition. (It won’t 
work on SQL Server 2000.) This stored 
procedure will incur table locks unless you 


each index 


have Enterprise Edition and you modify the 
procedure to do online rebuilds. E 
—Eric Peterson, President, 

Peterson American Consulting 

InstantDoc ID 96059 


TABLE 2 Percentage of Fragmentation After the Defrag Operation 


TableName IndexName Indexld Part FragPct Fill IndexType Unique PKIndex 
Customer PK_Customer 1 1 0 0 CLUSTERED Y 
Customer FK_CustCSPID 2 1 20 0 ONCLUSTERED 

Customer X_CustNo 3 il 0 0 NONCLUSTERED 

Customer X_LastName 4 1 0 0 NONCLUSTERED 

Customer X_FirstName 5 1 0 0 ONCLUSTERED 

Customer X_Customer_DBA 6 1 0 0 NONCLUSTERED 

Customer FK_CustAcctsReclD 7 1 0 0 NONCLUSTERED 

Customer X_Cust_RatelD 8 1 8.33333 0 NONCLUSTERED 

Customer FK_CustSiteAddrID 9 i 5 0 NONCLUSTERED 

Customer FK_CustMailAddrlD 10 i! 8.33333 0 NONCLUSTERED 

Customer IX_Customer_CorrAddrID |11 1 8.33333 0 NONCLUSTERED 

Customer FK_PrevCustID 12 1 5 0 NONCLUSTERED 

Customer FK_MasterCustID 13 1 4.66666 0 NONCLUSTERED 

Customer IX_Customer 14 il 0 0 NONCLUSTERED Y 
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by John Savill 


G Q Server 2005 needs the support of a number of 

Windows services to function reliably. Microsoft 
is updating and rebranding its systems management products, 
Microsoft Systems Management Server (SMS) and Microsoft 
Operations Manager (MOM), in its System Center product 
family. The enhanced capabilities in the new System Center 
products, particularly Microsoft System Center Operations 
Manager 2007, which is available now; Microsoft System Center 
Data Protection Manager (DPM) 2007, which was in beta at 
publication time; and Microsoft System Center Configuration 
Manager 2007, also in beta at press time, will support SQL 
Server 2005 in a number of ways. (Note that Operations Man- 
ager 2007 can also monitor SQL Server 2000.) Let’s take a tour 
of these products and see what they offer to help SQL Server 
DBAs keep a closer watch on their systems. (For a quick look 
at all the components that comprise the new System Center, see 
“What’s in Microsoft System Center 2007?” page 16.) 


Operations Manager 2007 

Operations Manager 2007 is based around IT service models. 
For the first time, it’s possible to manage an IT service from an 
end user’s perspective (i.e., as a single service) instead of as the 
separate components (e.g., server, application, disk space, Active 
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New features in the upcoming 
System Center releases make it 
easier to manage SQL Server and 

protect your databases 
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Directory—AD) that the service uses. Operations Manager 
still monitors these components individually but groups all 
required components for a particular service together, so that, 
for example, if AD were unavailable, any service that relied on 
AD would be shown as having a problem. 

In most environments, Operations Manager runs an agent 
on each monitored server or workstation (an agentless operation 
mode is also available). The agent basically only stores and relays 
information back to the Operations Manager server. The true 
power of Operations Manager lies in its management packs, 
which are product- or feature-specific packs of knowledge 
that are installed on servers that Operations Manager moni- 
tors. The management packs installed on a server depend on 
the software or features it’s running; for example, on a domain 
controller (DC) running DNS, you'd install the base OS, AD, 
and DNS management packs. Management packs are avail- 
able for many Microsoft products; you can find a complete list 
of Operations Manager management packs at http://www 
-microsoft.com/technet/prodtechnol/mom/catalog/catalog 
caspx?vs=2005. (You can download the Microsoft SQL Server 
Management Pack for MOM 2005 at http://www.microsoft 
.com/downloads/details.aspx?FamilyID=79f15 1c7-4d98-4c2b— 
bf72-ec2b4ae69191&DisplayLang=en.) 
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System Center 2007 


The management pack contains large 
amounts of information about the product 
(basically everything about the product that’s 
in the Microsoft Knowledge Base) and its 
use of the environment and tells the agent 
performance counters to watch for, registry 
values (which are important in configuration 
monitoring), and any other factors that could 
cause the agent to preemptively alert IT and 
take steps to avoid potential problems detected 
by the management pack. Š narave 

Although Operations Manager provides [9m 
the standard “x is down, go fix it” reporting, to 
which it generally will add information about (m= 


anmo inme i at= 
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how to fix the problem, the product’ ability > > FIGURE | Operations 


to warn preemptively is its most useful feature. 

You can configure a granular level of alerting. 

For example, you could configure Operations Manager to send 
alerts related to your SQL Server service to the DBA team and 
IT administrators. Or, you could set up Operations Manager 
to initially alert the Help desk, then, if the problem hasn’t been 
resolved after a certain time period, escalate the alerts by paging 
another group, and so on until the problem is fixed. 


The SQL Server Management Pack 

The SQL Server Management Pack provides a discovery 
component that lets Operations Manager examine instances, 
databases, file groups, files, agent jobs, and SQL Server roles in 
a SQL Server environment. You can fully customize the aspects 
you want to monitor and actions to perform. The management 
pack’s event-analysis function monitors all the key aspects of 
the SQL Server environment, such as clustering, log shipping, 
backup, SQL Server Agent, and, as mentioned, SQL Server 
roles (e.g., replication). In addition to providing overall views 
of the SQL Server system’s health, the management pack also 
provides in-depth views of databases, the database engine, SQL 
Server Reporting Services (SSRS), SQL Server 2005 Integra- 
tion Services (SSIS), and other SQL Server components. The 
main monitoring screen in Figure 1 provides a high-level status 
view of the computers that Operations Manager is monitoring. 
In the Computers section, you can add different columns to the 
components being monitored, such as database functionality. 
(In this example, I’ve added the column SQL DB Engine.) You 
select the columns to add to the main monitoring screen by 
right-clicking the column heading. 

After the SQL Server Management Pack has identified all 
the attributes and components that need to be watched, the 
management pack with Operations Manager can start moni- 
toring the environment. The management pack provides three 
core types of monitoring: 
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s Manager high-level monitoring view 


e Availability monitoring—At a basic level, the agent verifies 
that the database can be contacted by creating a synthetic 
database-connection transaction. The agent then checks the 
status of the services that SQL Server uses, agent jobs, the 
state of any backups, and replication state. The agent looks 
at around 400 different SQL Server events and any other 
occurrences that might affect availability. 

Performance monitoring—Operations Manager moni- 
tors core items such as caching ratio, user connections, 
processor utilization per instance, database and log size 
and growth (both in percentage and absolute terms), and 
response times to client requests. 

Configuration monitoring—The management pack 
understands the recommended best practices and applies 
this knowledge to the SQL Server systems being moni- 
tored. Operations Manager will generate alerts when 
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best practices aren’t being followed. 
For example, Operations Manager will 
provide an alert if it sees database con- 
figurations such as Auto Close or Auto 
Shrink enabled. 


Another feature that will appeal to both 
SQL Server novice users and experienced 
DBAs are the new ways to access functions 
that can aid in running SQL Server. As 
Figure 2 shows, if, for example, in the Micro- 
soft Management Console (MMC) System 
Center Operations Manager 2007 snap-in 


Whal’s in Microsoft System 
Center 2007? 


The next release of Microsoft System Center will 
include the following products: 


e Microsoft System Center Configuration 
Manager 2007—the next version of SMS 


e Microsoft System Center Operations Manager 
2007—the next version of MOM 


e Microsoft System Center Data Protection 
Manager 2007—a continuous data protection 
(CDP) and recovery solution 


e Microsoft System Center Reporting 
Manager—a systems management data 
warehouse and reporting platform 


e Microsoft System Center Capacity Planner 
2007—a product that helps size deployments 
of Microsoft Exchange Server 2007 and 
Operations Manager 2007 


e Microsoft System Center Virtual Machine 
Manager 2007—a product that assists in 
server consolidation and virtual machine (VM) 
deployment and provisioning 


e The code-named “Service Desk” product, 
which enables self-service Help desk portals 
for users and includes a Configuration 
Management Database (CMDB), which 
provides information about the current and 
desired state of your enterprise’s computers 


e Microsoft System Center Essentials 2007—an 
IT management solution that’s geared 
toward midsized businesses and is based on 
Operations Manager 2007, Windows Server 
Update Services 3.0 (WSUS), SQL Server 2005, 
and Microsoft Update 
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you select SQL DB Engine Tasks in the 
Actions pane, you'll see actions related to the 
database—for example, access to SQL Server 
tools, configuration options, and control of 
SQL Server services. Not only does Opera- 
tions Manager let you see what's happening, it 
gives actions for the relevant component. This 
applies to any item you select; for example, 
if you select a computer and not a compo- 
nent, you'll see options to list active sessions, 
processes, and other information relevant to 
the computer. As you can see, Operations 
Manager is very much a management solu- 
tion and not only a monitoring tool. 

It’s important to remember that Opera- 
tions Manager is also a trend-based tool. It 
will, of course, tell you about an impending 
problem or whether a problem has occurred. 
However, Operations Manager also tracks 
historical data, so that you can see relative 
performance of your SQL Server envi- 
ronment over a period of days, weeks, or 
months, depending on the frequency you’ve 
set for capturing metrics and the amount 
of database space that you’ve allocated to 
storing historical data. 


Data Protection Manager 2007 
The upcoming new version of DPM pro- 
tects systems running Windows 2000 or 


later and runs on any Windows Server 2003 
or Windows Storage Server 2003 server. 
Like its predecessor, DPM 2007 requires 
AD, SQL Server 2005, and SSRS. DPM 
2007 is targeted primarily at distributed 
environments. The product works with an 
agent running on every server that DPM 
is protecting. The agent captures byte-level 
changes in real time and also once an hour 
by default (you can change the default to 
any value, using 15-minute increments). The 
agent then sends these byte-level changes 
back to the central DPM server, which 
allows you to configure DPM to take snap- 
shot views of server data at various points 
in time (up to 512 shadow copies in DPM 
2007, compared with 63 shadow copies in 
the earlier version. A typical setup is to have 
DPM create three snapshots a day, say at 
9:00 A.M., noon, and 3:00 p.M.).An end user 
can even restore a database, for example, via 
DPM without administrator intervention. 
One of the most significant changes in 
the new version is DPM’: integration with 
tape backups. You can now back up initially 
to disk, then grandfather data from the disk 
backup to tape as the data reaches a certain 
age. Another important change, especially 
for SQL Server DBAs, is that DPM pro- 
vides improved continuous data protection 


FIGURE 2 Viewing database tasks via the Operations Manager console 
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database developer's perspective. Everyone has a slightly different idea of the perfect database. 
But the most useful databases share some characteristics. Following are seven functional fea- 

tures that the perfect database should have: 

e Supports your company's business requirements 

e Built so you can extend functionality as the business evolves 

e Built so you can expand scope as the company embraces additional lines of business 

e Built to scale easily—that is, the database can become larger in size without needing an equivalent 

increase in resources 
e Protects the integrity (and security) of the data 
e Delivers good performance 


I this Essential Guide, we will look at best practices for building a “perfect database” from a 


e Built to use and maintain easily 


Any database that can meet all seven of these criteria 
meets my definition of a "perfect database." The best 
way to begin is to consider the best practices of data- 
base design, data modeling, and database development 
(including how you write and debug your SQL code). 
Then you validate the schema against business require- 
ments. If you can find software packages to help you 
with these tasks, then creating the “perfect database” 
will take less time than if you had to do it manually. 
Following these best practice guidelines will get you 
well on your way to building a perfect database. 


Database Optimization Theory rests on the five fol- 
lowing concepts: 

e A clean database schema 

e Use of set-based code (T-SQL, in this case) 

e Judicious use of indexing 

e Enhancing concurrency 

e Tuning the server 


As a database developer, tuning the server is not 
within your purview—that’s the DBA’s job, so we 
won't address these issues here. Assuming that your 
requirements analyst has perfectly captured the busi- 
ness requirements, and assuming that your data mod- 
eler has perfectly mapped these business requirements 
to a clean database schema, we can start from a perfect 
beginning. 


Validating the Schema 

Too many database developers don’t take part in 
validating the database schema, either by choice or 

by command. However, you should be part of the 
team that does the validation. You’re the person who’s 
going to write the code that touches the database. You 
need to understand the schema, to understand how it 
relates to the business requirements, and to understand 
all the intricacies of the interrelated tables and objects. 


You can validate a schema in one of two ways. First, 
you can validate it against the business requirements, 
as captured by the requirements analysts. The second 
method is to determine how well the schema con- 
forms to relational standards. 


Validating a schema against business requirements can 

be a difficult task, and is best accomplished by team 
effort. You and your fellow team members can conduct 

a “design walk-through.” The design walk-through 
should be part of the data modeling process. Additionally, 
once you have the schema in hand, you’ll want to 

do a developer walk-through so you can see that the 
physical design will support your requirements for SQL 


development. As a database developer, you will have a 
different perspective and differing (and perhaps more 
pragmatic) needs than the data modelers. Together, you 
can step through the design, acting out various business 
scenarios, and making any last-minute design changes 
and additions that will be necessary to support your 
development efforts. Using data modeling tools cre- 
ated specifically to support logical and physical database 
models, you can quickly view and test the schema. 
You'll be able to identify the views of data that you'll 
need to support your programming. If you know that 
you'll need flag fields or work tables to support various 
applications, now is the time to discuss these with the 
data modelers and DBAs. 


The second type of schema validation is one that 
needs to be done by someone who is extremely 
familiar with standards of relational database design. 
The objects and structures contained within a data- 
base schema can be very simple or they can be 

very complicated. This same individual should be 
part of the design walk-through, so that he or she 
fully understands the business case that gave rise to 
this database schema. If this person is not you, then 
it should be someone with whom you can work 
closely. Standards of relationality are well known and 
well documented, and this person should be prepared 
to enforce them in any new database development 
effort. Fortunately, most data modeling software 
packages, such as that shown in Figure 1have built-in 
support for these relational database design standards, 
which facilitates this type of validation activity. 


Even when you know what to look for when validat- 
ing the schema, how do you get all this done within 
some reasonable timeframe? If you had to do it by 
hand, I’m not sure it would be possible. However, 
you're not limited to manually “walking the schema.” 
Software tools are available to help you with this task. 
Some tools enable developers to simulate changes 
before implementing them—essentially, you can test 
“what-if” scenarios and validate the design changes 
before subjecting your production environment to 
modifications that could have a negative impact on 
production performance or data integrity. 


You should assemble a software toolkit that will help 
you do your job better and more quickly. Smart 
developers have an arsenal of native and third-party 
tools to help them be more effective and more effi- 
cient while they’re building the perfect database. If 
you're working in a multi-platform environment, 
you would be wise to invest in and use third-party 


tools that operate with the same look and feel across 
all databases (SQL Server on Windows, Oracle on 
UNIX, etc). Familiarity with your tools will make 
your work go faster, and will speed you on your 
way to becoming a Master Developer who routinely 
constructs “perfect databases.” 


Indexing 

Indexing for performance will help you create the 
perfect database. As you already know, each SQL 
Server primary key column is, by default, indexed 
and clustered. Pay attention to this; do you want the 
data in a table stored on the hard disk in primary key 
order? The answer depends on how you're going to 
use the data. Most likely, this decision will be a team 
compromise, because most data is used in many ways. 


Whenever you have a one-to-one or one-to-many 
relationship between two tables, you'll have a foreign 
key column in the table that is on the “many” side 


What to Look for 
When Validating a Schema 


Create an acceptance checklist that you can 
use when validating the relationality of the 
schema. As a developer, you will look for 
such things as relationships (both explicit 
and implied), keys(both primary and foreign), 
and consistency in data types. Even spell- 
ing becomes a concern; if an object name 
is spelled incorrectly, you should correct it. 
Nothing is more aggravating than having to 
remember to misspell an object name every 
time you write a piece of code around it! 


You'll want to pay special attention to default 
values, constraints, and calculated columns. 
Default values and constraints are database 
objects used to express business rules and 
restrictions. As a developer, you need to 
decide whether to store these rules and restric- 
tions inside the database, or in some external 
business rules layer. There are pros and cons 
to each approach. My rule of thumb: If the 
constraint is very limited and very static, such 
as “Y” or “N,” then creating a column con 
straint is the most efficient method of storing 
and implementing the rule. If the constraint is 
complex, spans more than a single table, or 
you suspect that it will change over time 
(possibly in the near future), then writing that 
constraint in code and storing it in a middleware 
layer (called the business rules layer) is a better 
approach. A good data modeling tool can help 
you check and track objects like these. 


of the relationship. SQL Server doesn’t automatically 
index the foreign key. To ensure best performance 
on joins, make sure that every foreign key in the 
database is indexed. 


The next candidate for indexing is any column that 
will be used for sorting—any column consistently 
used in an ORDER BY clause of a SQL query. You 
should have a good idea of how the most important 
queries will be structured, therefore, you should 
know which columns will be used for ordering the 
data. You'll also want to index columns that will be 
used for restricting the returned data sets, such as 
those that consistently appear in the WHERE clause, 
and especially if you're going to use range-of-value 
conditions. If you can, use software tools to help you 
analyze these indexes as they’re used in test and pro- 
duction, and evaluate carefully any recommendations 
these tools may make regarding changes or modifi- 
cations to the indexes themselves. 


Database Programming 

Transact-SQL (T-SQL) is the native language of 
SQL Server. SQL Server has been optimized to 
process sets of data (rather than row-at-a-time); you 
should strive to make T-SQL your standard pro- 
gramming language. 


Avoid SELECT * whenever possible. SELECT 
* will return the entire row, all columns. If an appli- 
cation is expecting n columns of data in the result 
set, and it receives n+1 columns (because someone 
altered the table and added an additional column), 
this could cause the application to fail. 

Best practice: Explicitly request only the columns 
that you need. 


Use qualified table names when writing T- 
SQL queries. If you’re logged on to SQL Server 
as “SamS” and you execute a query like “SELECT 
* FROM Emp,” SQL Server will first look in the 
database catalog for the existence of a table called 
SamS.Emp. If it can’t find that, then it will look for a 
table called dbo.Emp. Multiple searches through the 
database catalog can get very expensive, especially 
with adhoc query workloads, which typically don’t 
fully qualify object references. To qualify a table 
called Emp in SQL Server 2000, use ownername. 
Emp; in SQL Server 2005, Emp would be called 
schema_name.Emp. If you're writing cross-database 
code, then use database_name.schema_name.Emp to 
qualify the table object. 

Best practice: Qualify the object names. 


Stored procedures—use them. Call them from 
external programs instead of assembling SQL dynami- 
cally and sending queries across the Internet to the data- 
base. Stored procedures are SQL Server objects, stored 
in the database and listed in the database catalog. This is 
the swiftest, most efficient, and safest method of query- 
ing a database. 

Best practice: Always use stored procedures. 


To minimize recompiling queries, don’t hard- 
code values. When these queries are compiled, SQL 
Server writes an execution plan that it caches, so it can 
re-use that plan when the identical query is re-execut- 
ed. SQL Server determines when a query execution 
plan is obsolete. This happens because of changing 
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ment will set a share lock on the row or page accessed. 
This can slow down other queries, and even temporar- 
ily interfere with modification statements. 

Best practice: Include in the explicit transaction only 
those lines of code that you absolutely need. 


All of these querying tips are best practices; with time 
you'll learn about them and incorporate them into 
your routines. To shorten the learning curve, look for 
tools that will quickly identify errors by stepping you 
through the code as it executes, monitoring variables, 
call stacks, and dependencies. Software that will help 
you troubleshoot your database logic and that will 
highlight code that could create performance bottle- 
necks would be worth its weight in gold. 


E Program for 
Performance 


= A “perfect database” is 
far easier to maintain 
than one in which flaws 


have been institutional- 
ized and all subsequent 
downstream operations 


have had to use tempo- 
rary schemes to compen- 
sate for poor design and 


cao development decisions. 
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Figure 1 Schema Examiner visually represents diagnostics 


index statistics, or mixed DDL and DML in a stored 
procedure, among other things. If SQL Server has to 
recompile the query every time it’s run, this can lead 
to serious performance degradation. 

Best practice: Parameterize your queries. 


Limit the code that you include within a trans- 
action—from the BEGIN TRANSACTION to the 
COMMIT statements that explicitly define a business 
transaction—to as few SQL statements as possible in 
order to avoid blocking other queries. Include only data 
modification statements inside this bracketed transaction; 
if possible, don’t include SELECT statements. If the 
transaction isolation level in force is the default (READ 
COMMITTED), then even a simple SELECT state- 


care om 


As a responsible database 
developer, you should 
constantly monitor query 
performance because 
you'll certainly be called 
upon to troubleshoot 

- query performance. 


Poor query performance 

is universal—it won’t 
limit itself to one specific database or one specific 
platform. You should be using code debuggers and 
optimizers and, of course, you should be able to profile 
your code as its running. Here’s a short checklist of 
things to be on the lookout for: 


Remove all restrictive table, query and join hints 
that you absolutely don’t need. The SQL Server 
optimizer is very powerful and very clever; it will most 
often choose an optimal execution plan—” optimal” 
meaning most efficiency for the least cost. When you 
insert hints into your queries, then you're constraining 
the optimizer. Hints like “SELECT * FROM table- 
name WITH (TABLOCK)” can block other queries 
from executing and create havoc within the SQL Server. 


Rule of thumb: Don’t use restrictive hints unless you 
absolutely need them. 


Use the least restrictive transaction isolation level, 
so that you optimize transaction throughput and minimize 
blocking. For most production operations, the default 
transaction isolation level, READ COMMITTED, 

will work fine. Once in a while, you'll have to use 

a more restrictive transaction isolation level (such as 
REPEATABLE READ or SERIALIZABLE) because 
you'll have to read the same exact data set before and 
after a data modification. These are usually rare events. 
You can modify the transaction isolation level just before 
running the restrictive query set, then reset it to READ 
COMMITTED after you’re done. If you change the 
transaction isolation level within a stored procedure, 
user-defined function, trigger, or user-defined type, when 
execution completes and the object returns control to 
the calling routine, the transaction isolation level setting 
should return to that of the calling program. If you’re 
using SQL Server 2005, you can use the new READ _ 
UNCOMMITTED_SNAPSHOT, assuming that you can 
use a snapshot instead of the live data. 

Rule of thumb: Use the least restrictive transaction 
isolation level that you can. 


Examine the execution plan before releasing any 
code to production. The execution plan that SQL Server 
chooses is dependent on many conditions, and this test 
is only as good as your development environment is in 
mimicking your production environment. Nevertheless, 
it’s a good idea to know what SQL Server plans to do. 
If SQL Server plans to use (what you feel is) an unwise 
operation to resolve a query, then you can take steps 

to mitigate this action, such as create a covering index 
where one didn’t exist before. 


Rule of thumb: Get used to evaluating execution plans. 


Use a profiler. An operational profiler can be 
invaluable when it comes to troubleshooting query 
performance. You can see the actual query plan that 
was used to resolve a query in production, as compared 
to the plan shown at compile time. SQL Server can 
dynamically update the query plan at execution time. 
This happens when a stored procedure is called with 
different input values, when statistics have been 
auto-updated, or if there’s a change in available 
resources (such as CPU and memory) between 
compile time and run time. SQL Server will detect 
these discrepancies and alter the execution plan 
accordingly. SQL Server will incur some overhead 
during these operations, so use profiling sparingly. 
Rule of thumb: Get cozy with a profiler. 


Where Does the CLR Fit? 


The Common Language Runtime (CLR), introduced 
with SQL Server 2005, was created to let develop- 
ers leverage their programming skills in languages 
such as VB.NET and C#.NET. Although the .NET 
platform enables you to program in a managed code 
environment (always helpful!), and offers a rich set 
of structures and capabilities inherent in these base 
languages (arrays, namespaces, classes, and struc- 
tured exception handling), it can inhibit your migra- 
tion toward set-level programming (T-SQL). The CLR 
languages (also known as “managed code”) can 

be much better at pure number manipulating and 
managing highly complicated execution logic and 
string handling than T-SQL, especially those that are 
purely computational, and that don’t need access to 
the data in the database. During initial 2005 testing, 
some reports indicate that scalar (number-crunch- 
ing) functions written in managed code ran up to 100 
times faster than the same UDF written in T-SQL. 
However, be aware that there have been reports of 
diminished performance when using API server cur- 
sors (cursors implemented on the server and man- 
aged by API cursor functions) or keyset-driven cur- 
sors (a scrollable, updateable cursor driven by a set 
of identifiers known as the keyset). Tests have shown 
that for pure data manipulation—selects, inserts, 
updates, deletes—T-SQL is still much faster than 
using the CLR. 


Re-validate the schema. If you, as a developer, have 
the authority and permission to add architectural objects 
to the database, such as indexes, then you'll want to 
include another round of schema validation in your rou- 
tine. You don’t want to “break” anything that’s already in 
production, so before you hand off to the Quality Check 
& Assurance folks, or to the Database Administrator for 
inclusion into the production copy of the database, you'll 
want to re-check the schema to ensure that the changes 
you’ve made are consistent with operational standards. 
Rule of thumb: Make sure your changes won’t break 
anything already in production. 


The Effort is Worth It 


Despite the title of this paper, we all know that no one 
can really build a perfect database. However, using best 
practices of database design, development, and program- 
ming will go a long way toward helping you build a data- 
base that is perfect for you. If you can find software pack- 
ages to help you design the schema, write and debug your 
SQL code, monitor performance and help you validate 
the schema against business requirements, then creating 
that “perfect database” will take less time than if you had 
to do it manually. Nevertheless, it’s worth the cost in time 
and effort to try to build that perfect database. 
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(CDP) and backup for SQL Server (as well 
as for Microsoft Exchange Server 2007 and 
Exchange 2003 and Windows SharePoint 
Services 3.0 and 2.0), compared with the 
earlier DPM version. 

DPM’ SQL Server support relies on the 
SQL Server Volume Shadow Copy Service 
(VSS) Writer to capture disk changes. After 
you install the DPM agent and reboot SQL 
Server, you can use the DPM Administrator 
Console to create a new protection group, 
which will display all the available members 
(i.e., servers) that could be included in the 
group. The agents running on the servers 
pass information to the DPM console, so 
that when you expand a server to view its 
details, you'll see basic information, such 
as volumes and shares. There’s also a cool 
new feature that lets you select a share for 
snapshotting, for which DPM will auto- 
matically locate the data and set any needed 
ACLs.The console also displays application- 
specific information; for example, on a SQL 
Server system, expanding the server one 
level displays the SQL Server instances run- 
ning on the server. Expanding each instance 
displays the various databases that are hosted 
in the instance. 

You can set the protection frequency for 
the selected databases—DPM can create a 
snapshot as often as every 15 minutes (up 
to 512 snapshots total, as mentioned earlier). 
Via the VSS Writer, DPM can send only the 
updated blocks or fragments of the database 
to the central DPM server, a backup method 
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that minimizes overhead on the network 
and makes restorations faster. 

To recover SQL Server data, you use the 
DPM Administrator Console’s Recovery 
tab to select either a point-in-time snapshot 
that’s stored in the DPM server or simply 
opt to restore the most recent (“latest”) ver- 
sion. If you use the best practice of keeping 
the database and transactions on separate 
disks and you want to restore a SQL Server 
database after a corruption or loss, opting 
to restore the most recent version restores 
the latest available database snapshot to its 
original location in the database, then plays 
back any missing transactions. Using this 
restore option should effectively restore the 
latest data on the database with no loss and 
without involving the SQL Server DBA. 
Additionally, as Figure 3 shows, DPM pro- 
vides options to recover the database into 
a new database; recover to actual database 
files to a location on disk, which an expe- 
rienced SQL Server DBA can then use to 
perform a recovery; or “restore”—that is, 
copy a snapshot of a point in time—to a 
tape. 


Configuration Manager 2007 
Configuration Manager is involved in 
keeping SQL Server systems up to date by 
ensuring that approved OS updates and SQL 
Server patches are applied in a controllable, 
reportable fashion. Configuration Manager 
provides a centralized method for deploying 
updates and software, which helps ensure a 
consistent Windows 
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such as DCs, DNS 
servers, and applica- 
tion servers that rely 
on SQL Server for 
their data storage. 
Configuration 
Manager also pushes 
out software and 
configurations, such 
as updated SQL 
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Server clients and configuration, to com- 
puters in the enterprise. Other Configura- 
tion Manager capabilities, such as its ability 
to inventory client and server hardware and 
software, can help you determine actions 
that you might need to take related to 
your SQL Server environment and could 
also help with troubleshooting, letting you 
quickly see which aspects of a system’s hard- 
ware setup could be causing performance 
problems. 

Configuration Manager lets you estab- 
lish a “desired configuration”’—that is, 
a configuration baseline that describes 
how you want a box to look in terms of 
software installed and other configuration 
parameters. When you use the desired con- 
figuration feature along with Group Policy, 
Configuration Manager can help ensure 
that SQL Server systems adhere to a pre- 
ferred configuration (e.g., which users have 
local access) and help keep the computers 
in the enterprise correctly configured for 
communication with the SQL Server 
environment. Many production problems 
are related to configuration errors, and the 
desired configuration feature can avoid this 
problem. 


Adding Value to SQL Server 


System Centers value for a SQL Server 
environment will depend on the size of 
your environment and current facilities. To 
get the most benefit, you need to under- 
stand the products—especially Operations 
Manager 2007, which provides a huge 
amount of information but without proper 
and educated tuning can quickly bury a 
monitoring team in warnings and alerts. 
DPM 2007 provides a great backup and 
protection solution that’s tailored to how 
SQL Server actually works. And Con- 
figuration Manager can help you ensure 
that your SQL Server systems are updated 
consistently and conform to a desired con- 
figuration standard. SOL] 
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Nothing beats a single Select statement 


alculating the number of work days elapsed between two 
dates (i.e., the business age) is one of the most common 
tasks in business application development. In many situations, 
this computation plays a significant 
| | mare on the WEB and sometimes crucial role in the 
Ba iene en application logic. For example, sup- 
pose that a company promises to 
ship orders within five business days from the payment date, 
and your assignment is to find all the orders that shipped later 
than this constraint. In other words, you must find all the orders 
in which more than five business days elapsed between the 
payment date and the ship date. 
The common approach to this problem is to use a single- 
row method.This approach calculates the business age for only 
one row in a table at a time and compares the result with the 
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required age. Then the code scrolls to the next record and repeats 
the calculations. This method is typically implemented either 
within application front-end source code (e.g., Visual Basic—VB, 
Delphi, C#) or by using more complicated T-SQL routines that 
loop through a data set, fetch a row, and calculate the business age. 
Although these routines are workable solutions, they have major 
drawbacks, such as increased network traffic to submit all rows 
from a SQL Server machine to a client for further processing 
rather than simply submitting a set of data that meets the busi- 
ness age requirements. In addition, SQL Server stored procedures 
introduce unnecessary complexity because you must create a loop 
to check every record to determine whether the records meet your 
criteria. 

In this article, I offer a solution that uses only one SELECT 
command for all the rows in a data set. This bulk approach is both 
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Business Age 


simple and elegant. Note that the examples 
and formulas I use are for US date calcula- 
tions. Readers outside the United States will 
need to modify my approach. (For more 
information about working days calculation, 
see “DATETIME Calculations, Part 5,’ June 
2007, InstantDoc ID 95675.) 


Solution 

The first step is to write a T-SQL SELECT 
statement to show business age as a calcu- 
lated column in a SELECT clause. Then, 
you need to write a statement that incor- 
porates business age into a WHERE clause 
to include only rows that meet certain 
aging criteria. Advanced developers can also 
incorporate the logic into a user-defined 
function (UDF). This solution lets you write 
a single routine that you can call anytime 
you need the calculations. 

Weekend identification (i.e., Saturday 
and Sunday) is based on a number that 
corresponds to the day of the week. This 
number depends on the value set by the 
SET DATEFIRST command, which sets 
the first day of the week. For US regional 
settings, Sunday is the first day of the week, 
Monday is the second day of the week, and 
Saturday is the seventh day of the week. 

A company’s holidays might be unique; 
therefore, few businesses use the same hol- 
iday schedule. The most common approach 


TABLE | Example Temporary Table 


RECORD_NUM DAYI DAY2 

1 1/17/2006 1/20/2006 
2 1/10/2006 1/20/2006 
3 1/7/2006 1/29/2006 
4 1/8/2006 1/28/2006 
5 1/31/2006 2/19/2006 
6 2/2/2006 3/20/2006 
7 2/27/2006 4/22/2006 
8 4/17/2006 4/20/2006 
9 4/18/2006 4/19/2006 


TABLE 2 Example Temporary 
Holiday Table 


hol_date hol_desc 
2/19/2006 Almost President Day(Sunday) 
2/28/2006 Our company holiday (Tuesday) 
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PP UISTING I Code to Create Temporary Table 


create table #tmpTable 


Crecord_num int identity,DAY1 smalldatetime,DAY2 


smalldatetime) 


INSERT INTO #tmpTable VALUES('01/17/2006','01/20/2006') 
INSERT INTO #tmpTable VALUES ('01/10/2006','01/20/2006') 
INSERT INTO #tmpTable VALUES('01/7/2006','01/29/2006') 

INSERT INTO #tmpTable VALUES('01/8/2006','01/28/2006') 

INSERT INTO #tmpTable VALUES('01/31/2006','02/19/2006') 
INSERT INTO #tmpTable VALUES('02/02/2006','03/20/2006') 
INSERT INTO #tmpTable VALUES('02/27/2006','04/22/2006') 
INSERT INTO #tmpTable VALUES('04/17/2006','04/20/2006') 
INSERT INTO #tmpTable VALUES('04/18/2006','04/19/2006') 


select * FROM #tmpTable 


C USTING 2 Code to Create Temporary Holiday Table 


create table #tmpHolidays 
(hol_date smalldatetime, 
hol_desc char(35)) 


INSERT INTO #tmpHolidays VALUES('02/19/2006','Almost President Day(Sunday)') 
INSERT INTO #tmpHolidays VALUES('@2/28/2006','Our company holiday (Tuesday)') 


CP USTING 3 Code to Calculate Business Age 


SELECT RECORD_NUM,DAY1,DAY2, 


DATEDIFF(day,DAY1,DAY2) 
-2*DATEDIFF(wk,DAY1,DAY2) 


- (select count(*) from #tmpHolidays 
where hol_date between DAY1 and DAY2 and DATEPART (dw, 


hol_date) not in (1,7)) 


+(CASE WHEN DATEPART (dw, DAY1) =7 then 1 else Ø end) 
-(CASE WHEN DATEPART (dw, DAY2) =7 then 1 else Ø end) 


as BUS_AGE 
from #tmpTable 


for dealing with holidays is to create a simple 

table in which to store holiday dates. 

To use the solution I suggest, you must 
be familiar with several built-in T-SQL date 
functions, such as the following, 

e DATEPART (wk, date)—Returns a 
number that corresponds to the day of 
the week (e.g., Sunday=1, Saturday=7) 

¢ DATEDIFF (day, startdate, enddate)— 
Returns the number of days between 
startdate and enddate. 


only. Table 2 shows the temporary holiday 
table that Listing 2 creates. 


Calculate the Business Age 
Next, calculate the business age. The algo- 
rithm to calculate business age can be 
broken down into 5 simple steps. 

Step 1: Calculate the number of calendar 
days (CD) between DAY1 and DAY2. 


TABLE 3 Calculated Business Age 


Create the Necessary 


RECORD_NUM DAYI DAY2 BUS_AGE 
Tables 1 1/17/2006 1/20/2006 3 
First, use the code in Listing 1 to create 
a temporary table named #tmpTable ° LIONE Hee : 
and insert the specified dates. (Note 3 1/7/2006 1/29/2006 1 
that I use the same dates throughout the 4 1/8/2006 1/28/2006 15 
article.) Table 1 shows the temporary 5 1/31/2006 2/19/2006 B 
table that Listing 1 creates. l B 5 27212006 3/00/2006 31 
Next, use the code in Listing 
x i 2/27/2006 4/22/2006 38 
2 to create a temporary holiday 
table named #tmpHolidays.This code ê ee es : 
4/18/2006 4/19/2006 1 


inserts just two dates, for test purposes 9 
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[> USTING 4 Code to Derive Business Age Greater Than 10 


SELECT RECORD_NUM,DAY1,DAY2,BUS_AGE 
FROM (SELECT RECORD_NUM,DAY1,DAY2, 
DATEDIFF(day,DAY1,DAY2) 
-2*DATEDIFF(wk,DAY1,DAY2) 

- (SELECT COUNT(*) FROM #tmpHolidays 
WHERE hol_date BETWEEN DAY1 AND 
+(CASE WHEN DATEPART (dw, DAY1) 
-(CASE WHEN DATEPART (dw, DAY2) 
FROM #tmpTable) as D 

WHERE BUS_AGE>10 


DAY2 AND DATEPART (dw, hol_date) NOT IN (1,7)) 
=7 THEN 1 ELSE @ END) 
=7 THEN 1 ELSE Ø END) as BUS_AGE 


Pr USTING 5 Code for Employing a User-Defined Function 


CREATE FUNCTION business_age (alimit_days Int) 


RETURNS TABLE 

AS 
RETURN 
¢ 


SELECT RECORD_NUM,DAY1,DAY2,BUS_AGE 


FROM (SELECT RECORD_NUM,DAY1,DAY2, 


DATEDIFF (day ,DAY1,DAY2) 
-2*DATEDIFF(wk,DAY1,DAY2) 


- (SELECT COUNT(*) FROM tmpHolidays 


WHERE hol_date BETWEEN DAY1 AND DAY2 AND 


(1,7)) 


+(CASE WHEN DATEPART (dw, DAY1) 
-(CASE WHEN DATEPART (dw, DAY2) 


FROM tmpTable) as D 
WHERE BUS_AGE>@limit_days 


CD=DATEDIFF(day,DAY1,DAY2) 


Note that this calculation doesn’t include 
both boundary dates. 

Step 2: Calculate the number of Sat- 
urdays and Sundays (SS) by multiplying 
the number of weeks between DAY1 and 
DAY2 by two. 


SS=2*DATEDIFFCwk,DAY1,DAY2) 


Note that weekends and weekdays are based 
on country settings; my example uses US 
weekends. 

Step 3: Find the number of holiday days 
(HD) between DAY 1 and DAY2, excluding 
any holidays that happen to fall on Saturday 
or Sunday. 


HD=(SELECT COUNT(*) FROM 
#tmpHolidays WHERE hol_date 
BETWEEN DAY1 and DAY2 AND 
DATEPART (dw, hol_date) NOT IN 
(1,7)) 


TABLE 4 Business Age Greater Than 10 


DATEPART (dw, hol_date) NOT IN 


=7 THEN 1 ELSE Ø END) 
=7 THEN 1 ELSE Ø END) as BUS_AGE 


Step 4: Add one day to the calculation 
if DAY1 was Saturday (S1), because Step 2 
already included DAY 1. 


S1=CASE WHEN DATEPART(dw,DAY1)=7 
THEN 1 ELSE @ END 


Step 5: Subtract one day from the calcu- 
lation if DAY2 was Saturday (S2), because 
Step 2 already included DAY2. 


S2=CASE WHEN DATEPART(dw,DAY2)=7 
THEN 1 ELSE @ END 


The final logical formula to calculate the 
business age is BA=CD-SS-HD+S1-S2. 
The code in Listing 3 employs this formula 
to add a calculated business age column to 
Table 1. Table 3 shows the new table with 
the business age column included. 

The code in Listing 4 builds on the 
previous calculation to retrieve all records 
in which the business age (i.e., the number 
of business days elapsed between DAY 1 and 

DAY?) is greater than 10. Table 4 
shows this derived data. 


Business Age 


User-Defined Functions 

As I mentioned previously, advanced devel- 
opers can employ UDFs to incorporate the 
code for calculating business age into one 
routine to use elsewhere. This approach 
works only for SLQ Server 2000 and later, 
with a minimum database compatibility 
level of 80. In addition, the UDF option 
doesn’t work with temporary tables. To 
use the code in my examples, you need to 
create permanent tables called tmpTable 
and tmpHolidays. Creating these tables is a 
simple matter of dropping the # character 
from the code in Listings 1 and 2. (This 
character designates a temporary table.) 


Create a simple table 
in which to store 
holiday dates. 


Listing 5 contains the code to create the 
UDF business_age. The simple command 
SELECT * FROM business_age(30) is 
used to select records in which the business 
age is greater than 30. Table 5 shows the 
resulting calculation. 


The Bulk Advantage 


Finding the business age between two 
dates stored in database is a common task 
for developers. The bulk method that I 
suggest is a creative solution that uses just 
a single SELECT command. This solution 
has three major advantages. First, the code is 
exceedingly straightforward—nothing beats 
a single SELECT statement for simplicity. 
Second, no application front-end code is 
necessary; you can use pure T-SQL com- 
mands to complete the task. And third, net- 
work traffic is kept to a minimum because 
only aged records pass from the server to 
client for further processing. Sou 

InstantDoc ID 95890 


RECORD_NUM DAYI DAY2 BUS_AGE 

3 1/7/2006 1/29/2006 15 

4 1/8/2006 1/28/2006 15 TABLE 5 Business Age Greater Than 30 

5 1/31/2006 2/19/2006 1B RECORD_NUM DAYI DAY2 BUs_AGE © Marina Davydova 
(marina_davydova@yahoo.com) is 

6 2/2/2006 3/20/2006 al 6 2/2/2006 3/20/2006 31 an application development manager, 

7 9/27/2006 4/22/2006 38 7 2/27/2006 4/22/2006 38 specializing in SQL Server, T-SQL, Delphi, 
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Identifying Sections 


Check out 2 ways to tackle a tricky problem 


his month, I want to introduce a problem that I 
like to think of as identifying sections. The generic 
form of the problem involves a table (call it T1) that has 
two columns of interest: one representing order among 
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rows (call it id) and the other holding some value (call it val). The task at hand is 
to identify sections of consecutive rows that share the same value. The terms sec- 
tion and consecutive rows are problematic when you're dealing with sets, but as I 
mentioned, the column ID represents logical order among rows, and once order 
is defined, these terms become meaningful. For each identified section, you need 
to return the minimum id, maximum id, val, count of rows in the section, and 


possibly other aggregates. 


Run the code in Listing 1 to create the table T1 and populate it with sample 
data. Table 1 shows the desired result. Before I demonstrate techniques to solve the 
problem in its generic form, I want to mention a couple of practical scenarios in 
which you might face this problem. One example involves bank transactions: You 
want to isolate consecutive periods of time in which an account had the same 
balance sign—negative, nonnegative, where the value element is SIGN(balance). 


Another example involves 
statistics regarding sports: 
You want to identify 
periods in which con- 
secutive games had the 
same results—losses, ties, 
wins where the value ele- 
ment is SIGN(positive_ 
points-negative_points). Of 
course, there are many 
other examples. ll dem- 
onstrate two techniques to 
solve the problem—one 
based on a subquery, and 
the other based on row 
numbers. 


The Subquery- 
Based Solution 
This problem’s primary 
challenge is to come up 
with an expression that 


PP LISTING I Creating and Populating T1 


SET NOCOUNT ON; 

USE tempdb; -- for test purposes 

GO 

TFROBJECTmUDIGudborinny, 
DROP TABLE dbo.T1; 

GO 

CREATE TABLE dbo.T1 

( 

id INT NOT NULL PRIMARY KEY, 

val VARCHAR(10) NOT NULL 

; 


"U') IS NOT NULL 


INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 
INSERT 


dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 
dbo.T1(id, 


VALUES ( 
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2 
VALUES( 3, 
Sy 
Uz 
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val) 
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VALUES 

VALUES 

VALUES( 9, 
VALUES(11, 
VALUES (13, 
VALUES(17, 
VALUES(19, 
VALUES (23, 
VALUES (29, 
VALUES (31, 
VALUES (37, 


CREATE 
CREATE 


UNIQUE INDEX idx_id_val ON dbo.T1(id, val); 
UNIQUE INDEX idx_val_id ON dbo.T1(val, id); 
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TABLE | Sections with Common Value 


end val 


start_ L 
section 


section 
1 

7 

11 


17 
29 
31 b 


num_rows 


returns a value that’s unique to the 
section to which the row belongs. TI 
refer to this expression as grp. Once 
you manage to write such an expres- 
sion, the rest is pretty easy: Group the 
rows by grp, and return the statistics 
requested for the group (e.g., minimum 
id, maximum id, count of rows). 

The code in Listing 2, page 24, dem- 
onstrates how you can use a subquery 
to calculate the section identifier (grp). 
Per each outer row in T1, the subquery 
returns the minimum id out of the rows 
with a greater id and a different val 
than the one in the outer row. Table 2, 
page 24, shows the output of this query. 
Notice that each consecutive section of 
rows (based on id ordering) with the 
same value has a unique grp value. The 
last group (the one with the highest 
ids) has a NULL grp value, but that’s 
not a problem. In the next step, you'll 
group the data by grp to return statistics 


Itzik Ben-Gan (itzik@solidqualitylearning.com), a 
mentor at Solid Quality Learning, teaches, lectures, and con- 
sults internationally. He manages the Israeli SQL Server Users 
Group, is a SQL Server MVP, and is the author of the /nside 
Microsoft SQL Server 2005: T-SQL series (MSPress, 2006). 
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[> LISTING 2 Calculating Section Identifiers with Subquery identifier. The section 


SEREGK aid EVA 
(SELECT MINCid) 
FROM dbo.T1 AS InnerT1 
WHERE InnerT1.id > OuterT1.id 


AND InnerT1.val <> OuterT1.val) AS grp 


FROM dbo.T1 AS OuterT1; 


TABLE 2 Section Identifiers Calculated 
with Subquery 


id val grp 

1 a 7 

2 a 7 

3 a 7 

5 a 7 

11 a 17 
13 a 17 
29 a 31 

i b 11 

9 b il 
17 b 29 
19 b 29 
23 b 29 
31 b NULL 
37 b NULL 


about the section, and NULLs behave 
just like known values for grouping pur- 
poses—namely, all NULLs will produce 
one group. 

The code in Listing 3 shows the com- 
plete solution. The query from Listing 2 
that produces the section identifier (grp) 
is used to create the derived 
table D; the outer query 


SELECT 

groups the rows by grp AIND 

and returns for each group MAXCid) 
val, 


the requested information 
(minimum id, maximum 
id, count of rows). Table 1 
shows the output of this 
query. 

This solution isn’t lim- 
ited to SQL Server 2005. 
It works in earlier versions 
of SQL Server, as well. 


Solution Based on 
Row Numbers 

The second solution uses 
the ROW _NUMBER 
function in a nontrivial 
way to calculate a section 


FROM dbo.T1 
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identifier is a combi- 
nation of val and the 
difference between 
ROW_NUMBER () 
OVER(ORDER 
BY id) and ROW_ 
NUMBER() OVER(ORDER BY val, 
id). 

This calculation would probably be 
best understood by running the code in 
Listing 4 and examining its output, which 
Table 3 shows. You're looking at the data, 
along with the individual row number 
calculations, and the difference between 
the two (call it diff). Table 3’s output is 
sorted for demonstration purposes by 
rn_val_id (val, id ordering). Keep in mind 
that the logical ordering that matters in 
identifying sections is by id alone. Notice 
that all rows in the same section get a 
unique diff value among the rows with 
the same val. In other words, the combi- 
nation of val and diff uniquely identifies 
a section. 

To understand the logic behind this 
calculation, examine the three sections 
with a val of a, and rn_val_id values 1-4, 
5-6, 7-7. These are row numbers based 
on val, id ordering. Among the rows with 
the same val (i.e., a), the row numbers are 
consecutive across the sections. However, 
if you examine the rn_id values of the 


C USTING 3 Final Solution Based on Subquery 


AS start_section, 
AS end_section, 


COUNT(*) AS num_rows 
FROM (SELECT id, val, 
(SELECT MINCid) 
FROM dbo.T1 AS InnerT1 
WHERE InnerT1.id > OuterT1.id 


AND InnerT1.val <> OuterT1.val) AS grp 


FROM dbo.T1 AS OuterT1) AS D 
GROUP BY val, grp 
ORDER BY start_section; 


LISTING 4 Calculating Section Identifiers with Row 
Numbers 


SELECT id, val, 
ROW_NUMBER() OVERCORDER BY id) AS rn_id, 
ROW_NUMBER() OVERCORDER BY val, id) AS rn_val_id, 
ROW_NUMBER() OVERCORDER BY id) 
- ROW_NUMBER() OVERCORDER BY val, id) AS diff 


ORDER BY rn_val_id; -- same as ORDER BY val, id 
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rows in those sections (row numbers 
based on id ordering), you'll naturally 
have gaps between the ranges of row 
numbers: 1-4, 7-8, 12-12. Now, within 
a section, both rn_val_id and rn_id keep 
incrementing in the same manner by one 
for each row. However, although there 
are no gaps between the sections in the 
rn_val_id values, there are gaps between 
the sections in the rn_ id values. In other 
words, the difference between the two 
(rn_id - rn_val_id, call it diff) is constant 
within a section, and increases in the next 
section with the same val. Thus, the com- 
bination of val, diff uniquely identifies a 
section. 

If you find this logic hard to grasp, 
examine Table 3’s output and try to deter- 
mine how you can use val, rn_id, and 
rn_val_id to calculate a section identifier. 
You might find it useful to examine the 
same data sorted by id (or rn_id, which is 
the same), as you see in Table 4. 

Finally, you're left with aggregating the 
data. Listing 5 shows the final solution. 
The code from Listing 4 (except for the 
presentation-related ORDER BY) defines 
the common table expression C.The outer 
query groups the rows by val and diff (the 
section identifier in this solution), and 
returns the desired aggregates (minimum 
id, maximum id, val, count of rows). Table 
1 shows the output of this query. 


TABLE 3 Section Identifiers 
Calculated with Row Numbers 
id 
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Solution to June’s Puzzle: Same Birthday 

What’s the probability that, in a group of 23 randomly chosen people, at least 
two of them will have the same birthday? The answer to this puzzle might seem 
strange. Most people intuitively assume that the probability is very low. However, 
the probability that two people in a group of 23 have the same birthday happens 
to be greater than 50 percent (about 50.7 percent). For 60 or more people, it’s 
greater than 99 percent (disregarding variations in the distribution, and assuming 
that the 365 possible birthdays are equally likely). The tricky part of the puzzle is 
that you need to determine the probability that any two people share the same 
birthday—not a specific two. For the exact solution and some interesting infor- 
mation about the birthday paradox, check out the Wikipedia entry at_http:// 
en.wikipedia.org/wiki/birthday_paradox. 


July’s Puzzle: Catching a Train 

Two trains race toward each other on a railway segment that’s 100 miles long. 
The trains are traveling at 100mph. An insect flying at 200mph flits from one train 
toward the other, and as soon as it arrives at the other, it flips its direction and flies 
back toward the first train. The insect continues bouncing back and forth between 
the trains until the trains crash. What’s the total distance that the insect covers until 


the moment of the crash? 


Missing Ranking Calculations 

This type of calculation (section identifier) 
is pretty generic and applies to many prob- 
lems. In essence, youre looking at a type 
of ranking calculation in which ordering 
is defined by one set of columns (id, in our 
case) and ranking is defined by another (val, 


TABLE 4 Section Identifiers Calculated 
with Row Numbers, Sorted by id 


id val rnid rn valid diff 
1 @ 1 1 0 
2 ORE? 2 0 
Eo E 3 0 
5 (famed 4 0 
o E 8 -3 
9 DES 9 -3 
io p 5 2 
ikea E 6 2 
17 E 10 -1 
19 p 10 11 -1 
23 D 11 12 -1 
29 12 7 5 
31 p 13 13 0 
37 P 14 14 0 
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in our case). SQL Server 2005 does support 
the RANK and DENSE RANK func- 
tions, but unfortunately neither allows a 
separation between the set of columns that 
define ordering and the set of columns that 
define ranking. Rather, sorting and ranking 
are one and the same. It would have been 
nice to have RANK and DENSE_RANK 
functions that permitted the separation. 
Here’s an example of pseudo code demon- 
strating how the syntax might have looked 
had it been supported: 


-- Don't run this unsupported syntax 


SELECT id, val, 
RANK(val) OVERCORDER BY id) AS grp 
FROM dbo.T1; 


With such support, solving this article’s 
problem would be trivial—not to mention 
the fact that such a function would lend itself 
to good optimization. Having an index on 
(sort_cols, rank_cols), this calculation could 
have been achieved with a single ordered 
pass over the leaf level of the index. 


Which Solution Is Better? 
I tested the performance of both solutions 
against more realistic table sizes than the 
ones I used in this article. On one hand, the 
subquery-based solution requires substan- 
tially more I/O than the solution based on 
row numbers. On the other hand, it doesn’t 
involve sorting, whereas the solution based 
on row numbers does. With a table size 
of a couple hundred thousand rows, both 
solutions run with similar performance. 
With large table sizes (e.g., a million rows 
and beyond), the nonlinear algorithmic 
complexity of sorting involved in the solu- 
tion based on row numbers outweighs the 
larger amount of I/O involved with the 
subquery-based solution. In other words, 
the subquery-based solution becomes 
faster. However, up to a million rows in 
the table, the differences aren’t dramatic, 
so you should use the solution that you 
feel more comfortable with. Hopefully, in 
the future, we'll see RANK and DENSE __ 
RANK functions with separate defini- 
tions of ordering columns and ranking 
columns, allowing such calculations to be 
both simpler and much faster. SOL 
InstantDoc ID 95912 


PP UISTING 5 Final Solution Based on Row Numbers 


WITH C 
AS 
¢ 


SELECT id, val, 


ROW_NUMBER() OVERCORDER BY id) 
- ROW_NUMBER() OVERCORDER BY val, id) AS diff 


FROM dbo.T1 


) 

SELECT 
MINCid) AS start_section, 
MAXCid) AS end_section, 


val, 

COUNT(*) AS num_rows 
FROM C 
GROUP BY val, diff 
ORDER BY start_section; 
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“Tn the future, everyone will be 
world-famous for 15 minutes.’ 


-Andy Warhol, innovator 


Have you or your staff come up 

with an innovative IT solution? 

Ifso, you could win a SQL Server 

Magazine or Windows IT Pro 2007 

Innovators Award for it! In addition to 

bragging rights and fame, prizes include 
complimentary airfare and conference passes to Fall 
Connections in Las Vegas, write-ups 


To enter and view 
complete rules, visit 


www.windowsitpro.com/ 


awards/innovators_2007.cfm 
or www.sqimag.com/go/innovator 


ExrJiServer 


in upcoming magazine issues, and more! 


Innovators Contest entries will be accepted May | through August 1, 2007. Winners will be notified by August 17, 2007. 
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Intermediate 


Help for Query Hints 


SQL Server 2005 plan guides let you 
specify when ło use a particular hint 


| n recent articles, I’ve discussed some of the ways to detect information about the 

plans SQL Server is using to access data that your queries have requested. Our 
next step is to explore some of the ways that you can affect what plan will actually be 
used. Although Microsoft typically recommends that you let the SQL Server query 
optimizer determine the plan to use for a query, sometimes you need to provide a little 
guidance for the optimizer in the form of hints. Supplying a query hint is a usually 
a straightforward change to your application code. However, in certain environments, 
you might either have no control over the code itself, or changing the code will break 
your licensing agreement or invalidate your support guarantees. 

In such situations, SQL Server 2005% plan guides feature can be helpful. Using a plan 
guide, you can instruct SQL Server to use a particular hint every time it encounters a 
specified query, and you don’t need to change the query itself: Although plan guides are 
frequently used with the new optimization hints included in SQL Server 2005, you can 


use them with almost any hint. Let’s start our exploration of 
e on the WEB 
Download the listings at 
InstantDoc ID 96134 


plan guides with an overview of how they work, then look 
at why a plan guide might not work as intended. 

Plan Guides Overview 

SQL Server 2005 supports three types of plan guides, all of which can be created by 

using the same procedure. Although almost all other objects in SQL Server 2005 use 

standard Data Definition Language (DDL)—CREATE to create the object, ALTER to 
change the object properties, and DROP to remove the object—plan guides haven't 
quite caught up with that paradigm. SQL Server 2005 provides two new T-SQL stored 
procedures for working with plan guides: sp_create_plan_guide, which creates a plan 
guide, and sp_control_plan_guide, which changes a plan guide’s properties or removes 
a plan guide. Web Listing 1 (http://www.sqlmag.com, InstantDoc ID_96134) shows the 
general form of the sp_create_plan_guide procedure. 
The three types of plan guides are 

*SQL—tells the optimizer to look for a specific SQL statement in your appli- 
cation as specified in the @stmt parameter. If the @module_or_batch param- 
eter is NULL, the SQL statement must appear in a batch by itself. Otherwise, 
the @module_or_batch parameter should include the exact text of the entire 
batch that the statement will appear in. 

e OBJECT— tells the optimizer to look for a specific statement in a specific 
module. The module can be a stored procedure or function, and its name is 
given in the @module_or_batch parameter. 

e TEMPLATE—tells the optimizer to build a template based on a class of que- 


ries. Since this type is a bit more complex than the other two, I'll wait to dis- 
cuss it further in a future article, after I’ve explained the other two types. And 
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because the @params parameter to 
sp_create_plan_guide is also only 
used in conjunction with TEM- 
PLATE plan guides, I'll discuss that 
parameter in an upcoming article as 
well. 


Creating an SQL Plan Guide 
On my SQL Server system, the execution 
plan for the following query (which you 
should run in the AdventureWorks data- 
base) shows that SQL Server will run this 
query in parallel, over multiple CPUs: 


SELECT TOP 10 * 
FROM Sales.SalesOrderDetail 
ORDER BY UnitPrice DESC; 


Figure 1, page 28, shows the right side of 
the graphical plan for the query. 

Whether the plan you get for this 
query will involve parallelism depends on 
a number of factors, not least of which is 
whether or not your server has multiple 
processors available. If I’m having prob- 
lems with parallel queries, I might decide 
to force this query to be run on a single 
CPU, whenever the query is run as a batch 
in an application. I could do so by creating 
a plan guide specifying the previous query 
as the SQL statement and not specifying a 
value for the @module_or_batch param- 
eter. As the plan guide in Listing 1, page 
28, shows, I’ve used the MAXDOP query 
hint as part of the @hints parameter to 
specify a maximum degree of parallelism 
of 1 (i.e., a single CPU). 

Once this plan is created in the Adven- 
tureWorks database, whenever the opti- 
mizer encounters the specified statement 
in a batch by itself, it will create a plan that 
uses only a single CPU. If the specified 
query occurs as part of a larger batch, the 
optimizer won't invoke the plan guide. 


Kalen Delaney (kalen@insidesqlserver.com) has been 
working with SQL Server for 20 years and provides erver 
training and consulting to clients around the world. Her most 
recent book is /nside Microsoft SQL Server 2005: The Storage 
Engine (Microsoft Press). 
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Query 1: Query cost (relative to the batch): 100% 
SELECT TOP 10 * FROM Sales.SalesOrderDetail ORDER BY Un. 


m < 7 tt: 
> 7 = 
pe Se Parallelism Sort Clustered tndex 
ast: 0% (Gather Streams) (Top N Sort) [Adventureworks] 
ft ` Cost: 1% Cost: 82 & Cost: 17 & 


> > FIGURE | Excerpt of graphical plan for a sample query 


Msg 8622, Level 16, State 1, Line 1 


of the hints defined in this query. 


without using SET FORCEPLAN. 


Resubmit the query without specifying any hints and 


N'plan_ 
SalesOrderDetail_ 
DOP1'; 


Query processor could not produce a query plan because 


> > FIGURE 2 Error message after running query in 
fen 


ting 3 


Enabling, Disabling, or 
Removing a Plan Guide 

You can enable or disable a plan guide by 
using the sp_control_plan_guide procedure. 
For example, I could disable the plan guide 
created in Listing 1 by using this statement: 


EXEC sp_control_plan_guide 
N'DISABLE', 
N'plan_SalesOrderDetail_DOP1' 


(Note that some statements in this article 
wrap to multiple lines because of space 
constraints.) My query would then revert 
to potentially using multiple CPUs when 
it was executed. I could re-enable the plan 
guide, also by using the sp_control_plan_ 
guide procedure, like this: 


EXEC sp_control_plan_guide 
N'ENABLE', 
N'plan_SalesOrderDetail_DOP1'; 


Td also use sp_control_plan_guide to remove 
the plan guide, like this: 


EXEC sp_control_plan_guide 
N'DROP', 


LISTING | Creating an SQL Plan Guide 
to Force a Query to Run on a Single CPU 


USE AdventureWorks 

GO 

EXEC sp_create_plan_guide 
@name = 

N'plan_SalesOrderDetail_DOP1', 

@astmt = N'SELECT TOP 10 * 

FROM Sales.SalesOrderDetail 

ORDER BY UnitPrice DESC', 
atype = N'SQL', 
@module_or_batch = NULL, 
@params = NULL, 
ahints = N'OPTION (MAXDOP 1)'; 
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Plan guides aren't intended 
to save on optimization 
time, only to make sure 
that your applications can perform well 
if you've specified hints that can help the 
queries they indicate. The queries themselves 
mught run faster with the hints forced by 
the plan guides, but optimization can take 
considerably longer. 


Validating a Forced Plan 

If the database in which the query is being 
run contains any plan guides at all, SQL 
Server must check whether any of the plan 
guides matches the query being processed. 
SQL Server hashes the query text and com- 
pares it with the hashed version of the queries 
for all existing plan guides, to verify a match. 
If it finds a match, SQL Server must verify 
that the plan guide matches the given query 
in the specified environment (the batch or 
module), and doing so takes extra time. Then 
the hints themselves need to be evaluated and 
compared with the plans that SQL Server 
would generate on its own.To guarantee that 
the forced plan with the hints is actually valid, 
SQL Server chooses to use only a plan that it 


LISTING 2 Creating a Plan Guide 
Forcing an Invalid Join Type 


USE AdventureWorks 
GO 
EXEC sp_create_plan_guide 
N'Hash_Plan', 
N'SELECT * FROM 
AdventureWorks.Sales 
-SalesOrderHeader AS h 
INNER JOIN AdventureWorks 
-Sales.SalesOrderDetail AS d 
ON h.SalesOrderID = 
d.SalesOrderID 
WHERE h.SalesOrderID = 45639;', 
N'SQ@L', 
NULL, 
NULL, 
N'OPTIONCHASH JOIN)'; 
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could come up with on its own. The forced 
plan needs to be one that was considered, 
then rejected by the optimizer. 

Let’s look at an example of a plan that’s 
considered invalid. The plan guide that 
Listing 2 creates is based on the query in 
the AdventureWorks database that Listing 
3 shows. There are indexes on the SalesOr- 
derID column in both the Sales.SalesOr- 
derHeader and Sales.SalesOrderDetail tables, 
and the index for Sales.SalesOrderHeader is 
unique. SQL Server's optimizer will realize 
this and determine that each table can be 
accessed by using a seek operation. In this 
situation, SQL Server 2005 won't consider 
a HASH JOIN to be any use. Nevertheless, 
SQL Server lets you create the plan guide 
in Listing 2 that includes the hint OPTION 
(HASH JOIN). However, if you then run 
the query in Listing 3 expecting that SQL 
Server will use the plan guide, you're in for 
a shock, and you'll get the unfriendly error 
message in Figure 2. 

The moral here is be careful when using 
plan guides. Users running the query who 
don’t know anything about your plan guides 
will also get that error message, and they might 
have no idea whats causing the problem. 


Use Plan Guides with Care 
In upcoming articles, we'll look at the 
other two types of plan guides and the new 
optimizer hints that you can use with them. 
Keep in mind, however, that hinting isn’t 
something you should expect to have to do 
on every query. Plan guides are intended 
for situations where hinting is the only way 
to get the desired performance from your 
queries. Plan guides have overhead of their 
own, especially during query optimization, 
so overuse of them can potentially make 
slow performance even slower. But in those 
cases where plan guides are truly useful, they 
can be the best thing that ever happened to 
your SQL Server applications. SOL 
InstantDoc ID 96134 


LISTING 3 Query for Plan Guide in 
Listing 2 


SELECT * FROM AdventureWorks.Sales 
-SalesOrderHeader AS h 
INNER JOIN AdventureWorks 
-Sales.SalesOrderDetail AS d 
ON h.SalesOrderID = 
d.SalesOrderID 
WHERE h.SalesOrderID = 45639; 
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A basic schema design for the 
data warehouse 


[Author’s Note: If you’ve been reading SQL Server Magazine since the beginning, 
you know that my column has focused mainly on design fundamentals for trans- 
actional databases. This month, I switch gears to cover design fundamentals for data 
warehouses and data warehousing. In subsequent columns, I'll cover topics such as 
dimensional table physical design, fact table physical design, fact table partitioning 
design, summary table design, and slowly changing dimension design. In addition, 
TIl discuss indexing techniques that are optimized for use with data warehouses and 
partitioned tables, and PI look at how to gather requirements for data warehouse 
design, which can differ from the requirements-gathering techniques used for trans- 
actional database design. | 


he data warehouse is one of the foundational structures of a business intelligence 

(BI) solution. Like transactional databases, data warehouses require a schema 
design. The most basic schema design for a data warehouse is a star schema. If you 
want to create multidimensional cubes for BI analysis, using a star schema for the 
data warehouse is a good solution. In this article, I explain why the star schema is 
preferable over other schema designs, and I use an example star schema to illustrate 
this design’s benefits. (To learn more about BI and data warehouses, see the sidebar 
“Data Warehousing: The Foundation of BI,” page 30, For more information about 
data warehouses and star schemas, see the Learning Path.) 

Several reasons exist for using a star schema rather than a conventional normalized 
design. First, you must use a star schema if you want to build and use OLAP cubes. 
The cube dimensions are the axes of analysis—the “by” items (e.g., by time period, by 
product line, by region).The fact table defines the cube and its purpose; you analyze 
the facts by or through the different dimensions. And, perhaps most importantly, the 
star schema provides fast response time when implemented as an OLAP cube. 

Another reason for using a star schema for data warehousing is that the star schema 
parallels the way that people tend to think about and use data. No one except data 
modelers, DBAs, and some database programmers think of data the way it’s structured 
in a transactional database. The star schema, when implemented as an OLAP cube, 
lets both developers and end users more easily understand and navigate the metadata. 
In addition, you can modify and build upon a star schema as your organization’s BI 
needs expand. Unlike with conventional transactional database schemas, you don’t 
have to worry about storing a non-key attribute only one time. And last but not least, 
the star schema broadens your choice of front-end BI tools because some tools work 
only against OLAP cubes. 

Figure 1, page 30, shows an example of a star schema that’s modeled after the 
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AdventureWorksDW sample data ware- 
house that ships with SQL Server 2005. 
The schema features only one fact table, 
Reseller_Sales.A fact table is a collection 
of keys and measures. The keys relate 
each row in the fact table to an associ- 
ated row in a dimension table. As in a 
transactional database schema design, the 
primary key of the Product dimension 
becomes a foreign key in the Reseller_ 
Sales fact table. The measures (effectively, 
anything that’s not a key column) are 
the operational data that the statisticians 
have been waiting for and are all nicely 
packaged and ready for analysis. 

The example star schema in Figure 1 
is meant to support decision-making and 
BI software tools. If you implemented 
the schema, it would be populated from 
comparable tables and columns in the 
transactional version of the Adventure- 
Works database. 

The schema dimensions (i.e., Time, 
Product, Reseller, and Sales_Territory) 
can be mapped to tables or views in 
the AdventureWorks database, which 
facilitates transferring the operational 
data into the data warehouse and ulti- 
mately into the Reseller_Sales cube. 
Each dimension is an axis of analysis in 
the cube, so you could analyze the data 
in the cube by month, by region, or by 
business type. 

Notice the simple dimensional hier- 
archy in this schema—from Product_ 
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Data Warehousing: The Foundation of BI 


One of the foundational structures of a business 
intelligence (Bl) solution is the data warehouse. To 
understand the most basic schema design for a 
data warehouse (i.e., the star schema), you must 
first understand the relationship between BI and 
data warehousing. 


Bl is a business management term that refers to 
the applications and technologies used to gather, 
provide access to, and analyze data and information : 

y 3 : Staging Transactional 
about a company’s operations. A data warehouse is 
a repository for a company’s historical data. Data 
warehouses can be physical or virtual, and they 
can be structurally relational, quasi-relational, 


(ETL) (OLTP) 
summarized, cubes, flat files, or a combination of 


styles. Data warehousing is the set of technologies > > FIGURE A Relationship between BI and data warehousing 
and techniques that you use to build and manage 
the data warehouse. 


Figure A illustrates the relationship between BI and data warehousing. The data warehouse gets its data from a variety of sources, including the 
extraction, transformation, and loading (ETL) staging database, the online transaction processing (OLTP) transactional database, or even directly from 
external data sources. Then, depending on the data needed for a BI project, you can spin off multiple OLAP cubes (also called multidimensional 
databases) from the data warehouse. 

(For example, a bank might analyze ATM 


PRODUCT_CATEGORY TIME : ede 
Ende it oq) Sent a STURT transactions for behavior, time of day, or queue 
ti t i . . ` : $ 
meea nene ll m Ok a information, whereas a retail operation might 


eae sae a perform a basket analysis on point of sale— 
arate ee identity Hon hunter tin al FO Mets mis) au op Te G iene 
ProdCatKey int Sike nuil ee Cae nut architecture, with the ability to tap into any 


ProdSubCatCode nvarchar(20) <ak> not null FiscalQuarter tinyint null 


ProdSubCatName nvarchar(50) null T EA a or all of the data sources, is the BI software 
Paena O mul tools layer. This layer represents numerous BI 
packages that you can use to analyze data, 
generate reports, and find information for 
at Z/N making business decisions. You can even feed 


ProductKey int <pk> identity s z y E 
RESELLER SALES information into automated activities and other 


ProductCode nvarchar(20) <ak> 2 - = 

ProdSubCatKey int <fk> ResellerSalesKey numeric <pk> identity f 

ProductName  nvarchar(50) SalesTerritoryKey int <ak> rocesse r additional analysis. 

ListPrice money ProductKey int pro esses 10 ad tio a ys 

StandardCost money ResellerKey int 
nvarchar(50) imeKey int 
nvarchar(50) SalesOrderNbr nvarchar(20) 
nvarchar(50) SalesOrderLineNbr smallint 
float RevisionNbr 


WeightUnitM har(50 OrderQuantity 
sla RS nitPrice Category to Product_Subcategory to 


ExtendedAmount : 
nitPriceDiscountPct float Product. This structure reduces redundancy 
DiscountAmount money 
ProductStandardCost money and makes the star schema a snowflake 
otalProductCost money > : 
SalesAmount money schema—albeit a rather lopsided snowflake 
axAmount money š $ à 
RESELLER FreightAmount money in this case. You can add as many dimen- 
ResellerKey int <pk> identity CarrierTrackingNumber nvarchar(50) : 
ResellerCode  nvarchar(20) <ak> not null FO CustomerPONumber  nvarchar(50) sions as necessary to the star/snowflake 
ResellerName nvarchar(50) null 


Other Automated 
Processes Activities 
BI Software Tools 


Business 
Information 
Reports | 


External 
Data 
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BusinessType  nvarchar(50) null schema. You can also implement more 
Phone nvarchar(50) null k 
NumberEmployees int null complicated structures, such as a geography 
OrderFrequency —_nvarchar(50) null š ž , 
dimension that is parent to both Reseller 
SALES_TERRITORY : ’ ; 
SalesTerritoryKey int I and Sales_Territory—but I'll save that dis- 
Region nvarchar(50) null : : : 
County rvarchata0) nl cussion for when I cover dimensional table 
Group nvarchar(50) null : 
design. 


The star schema design that Figure 1 
> > FIGURE | Example star (lopsided snowflake) schema shows has several notable characteristics: 
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1. Every table in the star schema has an 
identity primary key, which prevents quib- 
bling about natural primary keys versus 
surrogate primary keys. In a data ware- 
house, the primary key assignment goes 
to the surrogate; if you have a natural key 
that you want to retain for querying, you 
designate it as an alternate key. 

2.Most of the columns in both the 
dimension tables and the fact table are 
nullable. Only the primary and alternate 
keys are mandatory, and only the primary 
key is unique. You need to assume that data 
will be loaded into this data warehouse 
structure from various sources, even from 
multiple generations of operational data- 
bases. Therefore, the constraints that you 
would normally apply to enforce business 
rules in the transactional database must be 
relaxed in the data warehouse. Most of 
the columns must be nullable because you 
might not have data for them. 


3. Relationships are optional on the 
parent side (i.e., zero-to-one to zero-to- 
many). If you have no control over the 
source data’s referential integrity, you 
must allow for orphan records in the fact 
table and in lower levels of a dimensional 
hierarchy. 

4. Most columns in the fact table (i.e., 
the measures) are numbers. The fact table 
is the focus of a BI investigation, and BI 
analysts are looking for numbers and facts. 

5. Data redundancy is rampant through- 
out the star schema design; this redundancy 
is necessary for the data warehouse to reach 
an acceptable level of performance. The 
amount of data in a data warehouse is typi- 
cally enormous compared with the amount 
of data in a transactional database. When 
you write a T-SQL query against the data 
warehouse’s star schema, the redundancy 
minimizes the number of joins required to 
return the data, yielding much better per- 
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formance than if you issued the same query 
against the source transactional database. 


Schema design for a data warehouse 
need not be much different than schema 
design for a conventional transactional 
database. Because the data warehouse is an 
historical archive, you can retain some sem- 
blance of normalcy in the data warehouse 
schema design. In addition, you can create 
summary tables or columns, and you'll cer- 
tainly want to add timestamps and identity 
values to individual records in each of a 
relational data warehouse’s tables. A benefit 
of retaining a relational or near-relational 
data warehouse is that the warehouse is 
a readily available reporting database that 
you can use conventional T-SQL to query. 
With no programming learning curve 
and no special tool requirements, the data 
warehouse is an immediate ROI. SQL 
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Protect UDM 
with Dimension 
Data Security 


SQL Server 2005 Analysis 
Services’ security model 
can help restrict access to 
Unified Dimensional Model 
objects and data 


he explosion of viruses and hacker attacks in recent years 

has pushed security concerns to the forefront of develop- 
ment and application design. Responding to the need for 
security tools, SQL Server 2005 Analysis Services (SSAS) offers 
a robust role-based security model for restricting access to Uni- 
fied Dimensional Model (UDM) objects and data. 

You can leverage UDM dimension data security to protect 
dimension members and the data associated with them. First, 
you need to know the fundamentals of dimension data security, 
which I explain here. In a future article, PI discuss two practical 
approaches for implementing dimension data security: a factless 
fact table and integrating with an external security service. 


Setting Up Basic Dimension Data Security 

Similar to other Microsoft and home-grown solutions, the 

UDM security model leverages Windows security. The 

user is authenticated based on her 

e onthe WEB Windows account and authorized 

Download the sample | according to the security policies 

project at InstantDoc ID 95998 © the administrator has set up. To 

simplify security management, the 

UDM administrator can group Windows users and groups into 

database roles. Next, the administrator assigns role permissions 
to restrict the cube space the user is authorized to access. 

While UDM allows you to control access all the way down 

to the cube cells, most real-life security requirements are less 

granular. Typically, you'll need to secure access to dimension 
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members and data associated with these members. Dimension data 
security allows you to do just that. 

My sample Dimension Security project demonstrates how you 
can set up basic dimension data security. You can obtain the Dimen- 
sion Security project by going to http://wwwsqlmag.com, entering 
InstantDoc ID 95998, and clicking the Download the Code link. You'll 
need the AdventureWorksDW database to process the Dimension 
Security project’s Adventure Works cube. You can install the Adven- 
tureWorksDW database from the SQL Server 2005 setup program. 
In the Feature Selection step of the Setup wizard, click Advanced. 
Expand the Sample Databases folder and select the Adventure- 
WorksDW database. Alternatively, you can obtain the database by 
downloading and installing it from SQL Server 2005 Samples and 
Sample Databases (listed in the Related Resources box). 

The simplest approach to securing dimension data is to explicitly 
select which dimension members a given role is permitted to see. 
For example, the Adventure Works cube has Reseller and Geog- 
raphy dimensions. Let’s create a role whose members will have 
access to Australian resellers only. Although this example might not 
have any practical application, it demonstrates several important 
aspects of how dimension data security works. 

1. Open the Dimension Security project in Business Intelligence 
Development Studio (BIDS) or in Visual Studio 2005. Right-click 
the Roles folder and choose New Role to open the Role Designer. 
For the purposes of this demo, we won't assign members to this 
role. In real life, use the Membership tab to assign Windows users 
and groups to the role. 
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> > FIGURE | Dimension data security can be scoped 


> > FIGURE 2 Use the Basic tab to select allowed or denied 


at the cube (or database) level 


2. By default, UDM prevents the mem- 
bers of the role from accessing cubes in the 
containing SSAS project (database). Switch 
to the Cubes tab, click the Access column 
drop-down arrow, and select Read to grant 
the role access to the Adventure Works 
cube. 

3. By default, UDM grants the new role 
access to all dimensions in the database. 
Verify this by going to the Dimension tab. 
You can control dimension access at the 
cube or database level (recall that a dimen- 
sion can be shared among cubes) by using 
the Select Dimension Set drop-down box. 

4. Switch to the Dimension Data tab, 
which is where you'll set up dimension data 
security. As on the Dimension tab, you can 
set rights at the database or cube level. Let’s 
scope the dimension data security at the 
cube level. 

5. Click the Dimension drop-down arrow 
and select the Reseller dimension under the 
Adventure Works cube, as Figure 1 shows. 

6. In UDM, a dimension is a container 
of attribute hierarchies. For example, the 
Reseller dimension contains many attribute 
hierarchies, including the Country-Region 
hierarchy. The Basics tab allows you to 
secure dimension members explicitly by 


using one of two approaches—pessimistic | an MDX set of 
or optimistic. With the pessimistic approach, | allowed (or denied) 
you deny everything except a set of allowed | members behind the 


members called an allowed set. The opti- 
mistic approach is the opposite—you use 
it to allow all members except a set of 
denied members (a denied set). For more 
information about allowed and denied sets, 


34 July 2007 


dimension members 


see “Introduction to Dimension Security 
in Analysis Services 2005” (listed in the 
Related Resources box). 

For the purposes of this demo, take the 
pessimistic approach and deny all mem- 
bers except Australia. Expand the Attribute 
Hierarchy drop-down list, and select the 
Country-Region attribute hierarchy (as 
Figure 2 shows). Select the Deselect all mem- 
bers option to deny all members by default. 
Select the Australia member. 

Note: A cube can have many dimen- 
sions and attributes, and it can be difficult 
to remember which ones are secured. But 
fear not. Once you’ve made a change to the 
dimension data security, the Cube Designer 
appends (dimension security defined) after 
the secured dimensions and (attribute secu- 
rity defined) after the 


member, the generated MDX set has the 
following definition: 


{LReseller]. 
CCountry-Regionl. 
&CAustralial} 


If you select more members, the allowed 
set will contain a comma-separated list of 
these members. 

8. Save the role definition. 

9. In Solution Explorer, rename the role 
you've just created to Basic.role and click 
Yes in the confirmation box to change the 
object name as well. 

10. In Solution Explorer, right-click the 
Dimension Security project node and 
choose Deploy to send the changes to the 
server. 


secured attributes. 
This lets you easily 
see whats going on 
in the Dimension 
and Attribute Hier- 
archy drop-downs. 
7.As you select 
members on the 
Basic tab, the Role 
Designer constructs 


scenes. You can see 
this set by switching 
to the Advanced tab. 
Because you selected > 
only one allowed 
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Testing Dimension Data Security 
Let’s give dimension data security a try. 

1.In Solution Explorer, right-click the 
Adventure Works.cube node and choose 
Browse to open the Cube Browser. By 
default, the Cube Browser connects to the 
cube under the Windows identity of the 
interactive user (that’s you). Assuming you 
have local administrator rights on your com- 
puter, you have unrestricted access to the 
cube. That’s because the SQL Server 2005 
setup program grants implicit administrator 
rights to local administrators. 

2.Expand the Reseller dimension in 
the metadata tree, and drag and drop the 
Reseller Name attribute hierarchy on the 
report columns. Observe that you can see 
all resellers (a few hundred members). 

3. Let’s now find what members of the 
Basic role would see. On the toolbar, click 
Change User, as Figure 3 shows. Select the 
Roles option, click the drop-down arrow, 
and select the Basic role. Click OK. The 
Cube Browser clears the results pane and 
establishes a new session under the Basic 
role. The message You are browsing the cube 
using the credentials of the following roles: Basic 
is displayed under the toolbar. 

Note that when a user connects to the 
server, the server evaluates the role permis- 
sions during the process of initializing the 
user session (i.e., before the cube is available 
for browsing). If the user belongs to multiple 
roles and so wishes, he can tell the server 
which role(s) he wants the server to honor 
on connect. The SSAS connection string 
property supports a Roles setting, which 
the user or the application can use to specify 
a comma-delimited list of roles. The user 
can select only roles that he is a member 
of. UDM roles are additive, so if the user 
is a member of multiple roles, the effective 
permission set is the union of the allowed 
role permissions. 

4. Drag and drop the Reseller Name 
attribute hierarchy on the report columns 
once again. Now the Cube Browser shows 
only about 40 members—the Australian 
resellers only. We can verify that the results 
are correct by either dropping the Coun- 
try-Region hierarchy next to the Reseller 
Name hierarchy in the Cube Browser or by 
using the following MDX query in SQL 


Server Management Studio: 


select {[Reseller].[Reseller 
Name].CReseller Name].Members} 
on 0 

from [Adventure Works] 

where [Reseller].CCountry- 
Region].&[Australial; 


How Data Dimension Security 
Affects Data 

Recall that we configured the allowed set on 
the Country-Region attribute hierarchy, but 
we used the Reseller Name hierarchy on 
the report (note that both hierarchies belong 
to the Reseller dimension). Dimension data 
security has filtered the resellers in Australia 
even though we haven't set up a filter on the 
Reseller Name attribute hier- 
archy. It turns out that behind 
the scenes, the server applies a 
special behavior called Auto- 
exists that cross-joins attribute 
hierarchies. 

Understanding Autoexists. 
Thanks to Autoexists, when 
attribute hierarchies from the 
same dimension are requested 
side by side, the server auto- 
matically cross-joins their 
members and returns only the 
members that exist in both 
hierarchies (i.e., the inter- 
secting members). Because 
the Basic role can see only 
the Australia member in the 
Country-Region attribute hierarchy, the 
Reseller Name column shows only the 
Australian resellers. Autoexists is applied 
to all attribute hierarchies within the same 
dimension. For example, if you request the 
Bank Name attribute instead of Reseller 
Name on the report, only banks for resellers 
in Australia will be returned. 

Autoexists shouldn’t be confused with 
the MDX NON EMPTY behavior. Auto- 
exists is applied at the attribute level for 
all attribute hierarchies within the same 
dimension and can’t be turned off. NON 
EMPTY simply filters out members that 
have empty cells from the query results and 
is entirely optional. For example, you can 
turn off NON EMPTY in Cube Browser 
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by clicking Show Empty Cells on the 
toolbar. To see the difference between NON 
EMPTY and Autoexists in your report, click 
the drop-down arrow in the Reseller Name 
column header and note that only the Aus- 
tralian resellers are shown. 

Data security. As 1 mentioned at the 
beginning of this article, dimension data 
security secures dimension members and 
the data associated with them. From an 
end-user perspective, members that the 
user isn’t authorized to see and their data 
simply don’t exist in the cube. Imagine that 
a global WHERE clause is applied that finds 
only the data that’s associated with allowed 
members of all secured dimensions. For this 
reason, when using dimension data security 
on a cube, avoid hard-coding dimension 
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members in any cube scripts or MDX que- 
ries. If you hard-code a member that a user 
isn’t allowed to see, the user will get an error 
when she runs the script or query. 

An interesting question for your business 
users is what totals should the user see when 
he browses the cube by another dimen- 
sion? Should the totals exclude the data 
contributed by the members the user isn’t 
authorized to see, or not? SSAS supports 
both scenarios. For better performance, the 
server includes the disallowed members 
when calculating the aggregated totals (.e., 
the All member totals are used). For example, 
the report in Figure 4, page 36, shows the 
same results irrespective of the user’s role 
rights. 
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|Drop Column Fields Here 

Fiscal Year v (Reseller Sales-Sales Amount] 
$16,288,441.77 
$27,921,670.52 

04 $36,240,484. 70 
$80,450,596.98 


> > FIGURE 4 Report showing totals 


for all members 


If your business requirements dictate that 
the totals should reflect the contributions 
by the allowed members only, you need to 
enable a special server behavior called Visual 
Total. To do so, go back to the Advanced 
tab on the Dimension Data tab in Role 
Designer, and select the Enable Visual Total 
check box for the Country-Region attri- 
bute hierarchy you secured and deploy. After 
you reconnect in Cube Browser, the totals 
in Figure 4 will decrease to show Australian 
sales only. 

Cross-dimension security. In fact, the 
report in Figure 4 will show only fiscal year 
2004 (because apparently Australians bought 
Adventure Works bikes in 2004 only). That’s 
because (as noted above), the default NON 
EMPTY behavior filters out the empty 
members in the Time dimension (click 
Show Empty Cells on the toolbar to see all 
the years). 

But shouldn’t Autoexists propagate to 
all dimensions? Certainly there could be 
scenarios in which cross-dimension security 
would be desirable. For example, if you have 
Customer and Account dimensions that 
have a logical one-to-many relationship (i.e., 
one customer can have many accounts), it’s 


reasonable to expect that if a user is allowed 
to see only a subset of customers, she should 
see only the accounts that belong to that 
subset of customers; she shouldn’t be able to 
see other customers’ accounts. Cross-dimen- 
sion security could also yield performance 
benefits (e.g., an OLAP browser wouldn't 
have to load all the accounts of a large 
Account dimension). 

(Remember that by cross-dimension secu- 
rity, I mean preventing access to members, 
not their associated data. If a user doesn’t 
have access to a dimension member, dimen- 
sion data security prevents access to the data 
associated with a member without any extra 
work.) 

As it turns out, Autoexists is not applied 
across dimensions, and there’s nothing you 
can do to enable it. You might be tempted 
to try a workaround that simply cross-joins 
dimensions together to flow the security 
context from one to the other, as in: 


Exists (CDateJ.[CFiscal Year]. 
CFiscal YearJ.Members, 
CReseller].CResellerd. 
CReseller].Members) 


Here, the MDX Exists function cross-joins 
all members of the Fiscal Year and Reseller 
attribute hierarchies. Because you've already 
defined an allowed set on the Reseller 
dimension, you might expect that this state- 
ment would return only years in which 
Australian resellers have sales (2004, in this 
case). Unfortunately, the statement doesn’t 
work as expected, because when the dimen- 
sion security expressions are evaluated, 

theyre evaluated before the 


security filters are applied. 
For more information about 
the event execution order, 
see “Default members, MDX 
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> > FIGURE 5 Allowing access to a member in a 


parent-child dimension automatically grants access to the 


member’s parents 
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sion. It would be nice if a 
future release of Analysis Ser- 
vices would support cross- 
dimension Autoexists to 
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simplify cross-dimension security. 


Parent-Child Dimensions 
Parent-child dimensions present a special 
case for a couple of reasons. First, dimension 
data security with parent-child dimensions 
can’t be applied on the dimension key 
attribute. That’s why the Employee attribute 
hierarchy doesn’t appear in the Attribute 
drop-down list when you attempt to set up 
dimension data security on the Employees 
dimension. 

Second, allowing access to a given 
member in a parent-child hierarchy auto- 
matically grants access to the member’s par- 
ents all the way to the root member(s). It this 
werent the case, the user wouldn’t be able to 
navigate to the member. To test this access, 
select the Kevin F Brown member of the 
Employees attribute, and notice that Role 
Designer automatically selects his managers 
David M. Bradley and Ken J. Sanchez, as 
Figure 5 shows. 


Steps to UDM Security 


Setting and maintaining robust security 
policies is an essential task that every UDM 
administrator has to master. A database role 
can enforce security policies at different 
levels in the cube. Dimension data security 
restricts members of a role from seeing 
dimension members and their associated 
data by defining appropriate allowed and 
denied sets. Autoexists automatically propa- 
gates the security filter to all attribute hier- 
archies within the same dimension. 

Consider enabling Visual Total when you 
need the aggregated values to include the 
contribution of the allowed members only 
and exclude denied members. Dimension 
data security with parent-child dimen- 
sions is applied at the parent attribute, and 
enabling a member enables access to its 
parents. 

For links to more security resources, go 
to the “Analysis Services Security” Web 
page (listed in Related Resources). SOL 
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Comparative 
Review 


Must-Have KML Tools 


Excellent XML editing and 
support for XML Web services 


recently tested a pair of XML tools that I 

would classify as the two best XML devel- 
opment environments on the market today: 
DataDirect Technologies Stylus Studio 
2007 XML Enterprise Suite and Altova 
XMLSpy 2007 Enterprise Edition. Overall, 
both of these tools are excellent products, 
and I intend to keep both packages available 
on my machine. Yes, I installed them on the 
same machine and they coexist peacefully. 
However, although both tools are excel- 
lent XML editors, they each have unique 
characteristics and show the influence of 
their original target audience. If your job 
involves constant XML work, you might 
find benefits to having both. 

In reviewing these tools, I focused on 
several feature sets that each tool supports. 
XML editing is the primary function of 
each tool and a function that each tool 
performs admirably—so much so, in fact, 
that I don’t think there are appreciable dif- 
ferences in capability. Next I looked at the 
tools’ support for XQuery, a query language 
I generally try to avoid using. Each tool 
has a great set of XQuery-based features. 
Finally, I tested XML Web services support, 
one of the areas where I see real value for 
my long-term use of these tools outside of 
XML editing. 


Stylus Studio 2007 XML 
Enterprise Suite 

Installing Stylus Studio 2007 XML Enter- 
prise Suite is simple. After installation, I 
found that Stylus Studio had mapped the 


XML extension to itself. The tool started 
and ran without problems; however, if 
you need assistance, Stylus Studio has a set 
of videos available online that are, in my 
opinion, extremely valuable. This was good, 
because as a .NET developer I initially felt 
a little out of place in the UI. My impres- 
sion is that this editing environment seems 
to target those who work more with Java 
and other non-Microsoft tools, so I think 
the average Java developer will probably 
prefer the Stylus Studio environment. The 
tool doesn’t integrate with Microsoft Visual 
Studio, and in fact it doesn’t appear to 
directly integrate with anything. Although 
XMLSpy has integration features available, 
both packaged and as free downloads, Stylus 
Studio is a standalone environment. 

I worked with the Stylus Studio editor, 
and as you would expect, it’s able to read 
and validate Document Type Definition 
(DTD) and related namespace declarations. 
Although the Stylus Studio interface is set 
up with a focus on things such as links to 
the “Berkeley DB XML,” with the online 
training assistance it was easy to get started 
and up to speed quickly. The behavior of 
such things as the Project display was slightly 
different from what I expected; however, in a 
short period of time I was comfortable with 
the Stylus Studio interface. By dragging and 
dropping elements from one XML docu- 
ment into another, I was able to quickly start 
creating an XQuery document. Overall, 
I found the support for XQuery impres- 
sive and liked the fact that Stylus Studio’s 
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by William Sheldon 


DATADIRECT TECHNOLOGIES 
STYLUS STUDIO 2007 XML 
ENTERPRISE SUITE 


PROS: Easily mastered standard interface; 
strong XQuery tools, including support across 
several processors and Web service integration; 
EDI support; supports Java code generation 


CONS: Doesn't integrate with other 
development environments; limited Web service 
debugging 


RATING: YX WX WY 


PRICE: Single user license as tested lists at 
$895 


RECOMMENDATION: Chose this tool, which 
is focused on interoperability and public XML 
standards, if you work exclusively with Java or in 
a heterogeneous environment. 


CONTACT: DataDirect Technologies ¢ 781-280- 
4488  http://www.stylusstudio.com 


open architecture let me select from one of 
several different XQuery processors. One 
thing that’s consistent across the behavior 
and capabilities of this tool is its adherence 
to and support of open standards. 

This support for open standards was 
evident when I looked at the tool’s XML 
Web services capabilities. The Stylus Studio 
environment supports Universal Descrip- 
tion, Discovery, and Integration (UDDI) 
as the basis for locating and manipulating 
Web services. The interface for working 
with the XML messages used by the Simple 
Object Access Protocol (SOAP) format lets 
you view the raw SOAP message as well as 
generate test messages to the server, as Figure 
1, page 38, shows. The tool lets you connect 
to and send basic data elements to a remote 
service and view both your outbound mes- 
sages and the reply messages from that ser- 
vice. However, unlike XMLSpy, the calls to 
a Web service happen automatically and you 
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can’t intercept and review the calling XML. 
This makes it somewhat more difficult to 
diagnose connection-related problems. In 
terms of making calls to a Web service, the 
tool provides an excellent environment that 
even supports integration with XQuery and, 
with a few additional steps, EDI. 

One of Stylus Studio’s feature sets is its 
support for industry-standard EDI. I work 
with people who use EDI, and they felt that 
having Stylus Studio could help them if they 
didn’t already have Microsoft BizTalk Server. 
In terms of mapping new message formats, 
they felt Stylus Studio’s EDI support was an 
excellent feature and, in the right scenario, 
reason enough to select this tool. 

One aspect that I didn’t like was the 
product’s Web page devoted to “gripes” 
about its competitor. Certainly the company 
can compare its product with the competi- 
tion. However, repeating complaints about 
the competition from bloggers who have a 
history of supporting the company’s product 
is, to put it politely, a distraction. 


XMLSpy 2007 Enterprise Edition 
Installing XMLSpy 2007 was also smooth. 
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FIGURE | Stylus Studio’s interface for working with the XML messages used by the 
SOAP format 


The installation software asked me if I 
wanted to assign XMLSpy as my default 
XML editor. After the installation and reg- 
istration were complete, I opened the tool, 
which defaults to a sample project. The 
default project contains excellent examples 
of several key feature sets, including the Web 
Services Description Language (WSDL) 
editor, which Figure 2 shows, and the SQL 
debugger. However, although the interface 
layout is similar to Visual Studio, it was also 
a bit overwhelming. 

When working with the XML editor 
initially, I wondered, “Whats going on 
here?” In large part, my initial questions 
arose because the tool integrates elements 
such as the source XML document with its 
associated Extensible Style Language Trans- 
formations (XSLT). When these two items 
exist in the same project, the edit window 
for the source XML document will display 
the document’s data based on the XSLT. For 
a new user, this integration can be a little 
disconcerting. However, as you explore and 
use the online training videos and achieve 
the productivity that such integration sup- 
ports, what at first seemed like magic begins 
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to make sense. In fact, after I got going with 
XMLSpy, I found its integration with XSLT 
an advantage over Stylus Studio. 

This XSLT integration is an important 
feature, with only one drawback. The dis- 
play window for a document includes a 
bottom row of tabs, one of which is labeled 
“Authentic.” This button consistently failed 
because it was looking for another Altova 
product, even on the examples provided by 
Altova. Altova offers a suite of products that 
are all associated with XML-type activi- 
ties. The company sells XMLSpy as part of 
these packages, which include features such 
as Style Sheet editing. As I used XMLSpy, 
I encountered several instances in which 
I clicked buttons that essentially told me I 
needed another product to make the feature 
that I was trying to use actually work. The 
Authentic tab is just one example; the same 
thing occurs when you open the DTD/ 
Schema menu and the menu option offers 
to install a trial version of MapForce. Altova 
needs to follow the Visual Studio model, 
which just omits unavailable features, and 
remove such messages. 

With regard to XQuery, one of the fea- 
tures I really liked was the way XMLSpy’s 
XQuery debugging worked. The fact that I 
could stop an XQuery execution in process 
and review all the variables was great. I dis- 
covered that Altova uses its own customer 
processor for XQuery. Although Pm not a 
big XQuery user, this practice strikes me as a 
potential risk if I’m considering developing 
an XQuery query that will eventually run 
in another environment. Working against a 
custom processor means that my XQuery 
might not act exactly the same in the target 
environment, and debugging any problems 
at that point would become difficult and 
time consuming. Although the risks might 
not be a problem for a developer looking to 
integrate with .NET and related Microsoft 
technology only, it could be a significant 
drawback in a heterogeneous environment. 

As for XML Web services, I was impressed 
with XMLSpy’s editing environment. The 
graphical layout of the SOAP message struc- 
ture helped me visualize the data that was 
being transferred. Although the debugger 
is powerful, by default it’s disabled so if you 


jump in without getting more information 
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related to enabling this debugger 
you might have a problem. 


However, after you enable the 
debugger, it actually intercepts the 
outbound request and inbound 
response to your Web service so 
that you can see whats actually 
being sent across the wire. After 
I got accustomed to the environ- 
ment, I found XMLSpy to be an 
excellent tool. 

Finally, one feature that really 
appealed to me about XMLSpy 
was its integration with Microsoft 
Internet Explorer (IE). XMLSpy 
let me right-click and open the 
HTML source for a Web page 
within its editing environment. I 
know that hundreds, if not thou- 
sands, of other developers will 
scream in disgust at the very idea. 
But for each of them, another [3 
developer will say, “Hey, that’s 
cool.’ To me, this feature helps illustrate the 
difference in approach between these two 
tools. Stylus Studio is a standalone product 
that focuses on playing in its own sandbox. 
XMLSpy however, plays well with others: 
It follows a more integrated method of 
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ALTOVA XMLSPY 2007 
ENTERPRISE EDITION 
PROS: Supports Microsoft Visual Studio, .NET, 
and Eclipse Web services platforms; supports 
Java, C++, and C# code generation; offers Web 
service design and debugging tools 


CONS: Initial experience with Ul can be 
overwhelming; some features such as Web 
service debugging can be challenging at first 
use; features that require other Altova products 
aren't hidden or disabled 


RATING: XX YX WX WX YL 
PRICE: Single user license as tested lists at $999 


RECOMMENDATION: Choose this tool if you 
work exclusively with Visual Studio or if you do 
the majority of your work with .NET. 


CONTACT: Altova © 978-816-1600  http:// 
www.altova.com 
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FIGURE 2 XMLSpy’s WSDL editor 


leveraging the system to provide a variety 
of features that a user might find valuable. 


Reviewing the Pros and Cons 
Both products are excellent tools, and each 
has a feature list that’s huge. You'll probably 
make your choice based on specific technical 
requirements or depending on the tools 
approach to problem solving. Although each 
tool has unique features, the tools overlap in 
most areas. 

If you need to use one of these tools 
because it has a specific feature such as EDI 
support or Visual Studio integration, but it’s 
not a tool that fits your style, let yourself get 
acclimated to that environment. As someone 
who uses Visual Studio almost every day, I 
felt that XMLSpy was a companion envi- 
ronment. However, I was impressed by how 
comfortable I felt working within Stylus 
Studio and how productive I was when 
using it. 

I want to work in both of these tools’ 
environments because each has some unique 
characteristics. For example, even though 
each supports XML Web services, each does 
so with a different feature set—Stylus Studio 
provides what I consider to be a better 
environment for exploring external Web 
services whereas XMLSpy provides a richer 
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environment for developing such services 
locally. 

Thus my plan is to keep both tools 
available, although I set XMLSpy to be my 
default XML editing environment because 
I like the general layout and display of 
the editor when working with XML and 
because of the tool’s integration with Visual 
Studio and .NET. However, I don’t want 
to surrender such features as Stylus Studio’s 
support for open standards or its support for 
UDDI lookup of Web services; someday I 
might be involved in an EDI project and 
I would want to be able to use Stylus 
Studio in that case. In fact, if I spent a large 
percentage of my time working with non- 
Microsoft tools, I could easily see myself 
setting Stylus Studio as my default handler 
for XML file extensions. As a result, it seems 
appropriate to designate both of these tools 
as earning an Editor’s Choice award. You 
might not have the option of using both, but 
either one of these tools could be the right 
tool for you. SOL 
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Now’s the time to 
settle into some siz- 
zling summer reading 
on the latest IT trends 
and technologies. 
These eBooks offer 
an in-depth look into 
topics such as email 
security, disaster 
recovery, messaging 
management, 
emerging technologies 
and more. View a com- 
plete listing of eBooks 
at windowsitpro.com/ 
ebooks. 


What's Hot 
This Summer: 


Technical Topics You Can't Afford to Miss 


File Area Networks: Your First 
Look at FAN Technology 

Gain control over the growing amount 
of file data in your enterprise. Learn 
how File Area Networks (FANs) can 
help you centralize file consolidation, 
migration, replication, and failover. 
Start streamlining your file 
management projects today! 
windowsitpro.com/go/brocade/julyad 


Data Protection and Disaster 
Recovery Tips 

Discover a wealth of information about 
how to protect and secure your data 

in the event of a disaster. You may not 
be able to predict the exact details of a 
disaster, but you can be prepared with 
a solid response for when one strikes. 
Disaster can strike anywhere—not 
just where severe weather can hit—so 
make sure you're ready when it does. 
windowsitpro.com/go/ca/julyad 
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Messaging Management 

A secure mail and messaging 
infrastructure is fundamental to your 
business and any organization should 
plan for the appropriate message 
hygiene, availability, and control 
services from the start. Introduce 
yourself to three fundamental mail and 
messaging management services— 
security, availability and control servic- 
es—and learn how to implement them. 
windowsitpro.com/go/symantec/julyad 


Spam Fighting and Email Security 
for the 2ist Century 

Protect your users and your network 
against email-borne threats. Gain the 
knowledge required to understand the 
real threat that email-borne attacks 
pose, and how to address those attacks 
in a way that reduces risk while 
ensuring users aren’t impacted. 
windowsitpro.com/go/ironport/julyad 
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The Move to Multicore 


More power means better performance 


for your systems 


he move to multicore processors is 

boosting system performance to levels 
that have never been seen outside of high- 
end SMP systems. Better yet, this increase in 
processing power is happening without the 
huge increases in price and power require- 
ments that moving to traditional SMP sys- 
tems entails. For SQL Server systems, added 
processing power can mean increased levels 
of system performance as well as higher 
levels of scalability. Let’s take a look at some 
of the latest developments in the dual-core 
and quad-core processors that Intel and 
AMD are bringing to market and see what 
this technology means for your SQL Server 
environment. 


Dual-Core Performance 

When manufacturers became unable to 
improve processing power simply by 
boosting processor speed, both Intel 


performance over the single-core Opteron 
852. 


Dual-Core Design 

In 2005, Intel became the first to enter the 
dual-core market with the release of the 
Pentium D processor, built using the Intel 
NetBurst microarchitecture. In January 
2006, Intel switched to the Core micro- 
architecture, which uses a shorter instruc- 
tion pipeline than does NetBurst, letting 
processors execute substantially more 
instructions per clock cycle and achieve 
higher levels of performance even though 
they run at a lower clock frequency than 
earlier Intel CPUs. 

The shared front-side bus technology 
of Intel’s dual-core design gives each pro- 
cessor half the bandwidth of the front-side 
bus. Memory and I/O access operations 


and AMD realized that the easiest path 
to more power was through parallelism. 
The ever-shrinking size of processors 
made it possible to produce dual-core 
chips, which combine two processors 
on a single die. An added benefit 
of dual-core chips is that they nearly 
double the available CPU power while 
using the same power envelope (Le., 
the same wattage requirements) as a 
single processor. Figure 1 shows the 
results of running the SAP Sales and 
Distribution (SD) Users benchmark 
on AMD Opteron systems that were 


SAP SD Users Results 
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identically configured except for the 
processor. The dual-core Opteron 875 > 
provided a 74 percent increase in 


FIGURE | Performance of single-core 


processors vs. dual-core processors 
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also share the bus, making the bus speed 
a critical factor in overall system perfor- 
mance. Intel’s latest dual-core processor, the 
Core 2 Duo, is built using 65 nanometer 
(nm) technology and integrates both cores 
on a single die. Each core has 64KB of 
dedicated L1 cache—a 32KB instruction 
cache and a 32KB data cache—and both 
cores share a 4MB L2 cache. The Core 2 
Duo has a new power-saving design and 
a 1066MHz front-side bus. It also supports 
Intel Extended Memory 64 Technology 
(EM64T), Intels 64-bit memory exten- 
sion, and Intel Virtualization Technology 
(Intel VT). 

Following Intels lead, AMD intro- 
duced the 64-bit dual-core Athlon 64 
X2, and later the dual-core Opteron. In 
AMD’s Direct Connect Architecture, 
each CPU has an integrated memory 
controller and the HyperTransport bus 
runs at 1GHz and allows an 8GBps direct 
connection between the CPUs, I/O, 
and memory. The AMD Opteron 875 
dual-core processor has an L1 cache with 
64KB for instructions and 64KB for data, 
plus a 1MB 12 cache. AMD manufactures 
its dual-core line using 90nm technology. 
In February 2007, AMD released new 
dual-core Opterons that run at clock rates 
up to 2.8GHz and provide greater power 
efficiency than earlier models. 


Quad-Core Performance 

The jump from dual-core to quad-core 
processors delivered another big perfor- 
mance boost. Figure 2, page 42, shows the 
results of some benchmark tests on the 
Quad-Core Intel Xeon processor X3220 
and the Dual-Core Intel Xeon processor 
3070. The SPECfp_rate_base2000 bench- 
mark measures floating point performance, 
the SPECint_rate_base2000 measures 
integer performance, and the LINPAC 
measures billions of floating point opera- 
tions per second. SPECjbb2005 is a Web- 
based Java benchmark that simulates an 
order entry system. Although the X3220 
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runs at a slightly slower clock rate, it out- 
performs the 3070 in all the benchmarks. 


Quad-Core Design 

With the release of the Quad-Core Intel 
Xeon 5300 series in November 2006, 
Intel is the clear leader in the quad-core 
race. AMD won’ have an entry until mid 
2007, when it will introduce a quad-core 
chip code-named Barcelona. However, 
Intel’s and AMD’s quad-core designs are 
significantly different. 

Intel’s quad-core design puts two dual- 
core processors onto a single chip. In other 
words, instead of being a “native” quad- 
core processor, Intel’s quad-core Xeon is 
actually a dual dual-core chip. Although 
this architecture enabled Intel to beat 
AMD to market, the design isn’t optimal. 
When processors that are on separate 
cores exchange data, the data must be 
sent over the front-side bus and through 
the memory controller, which isn’t the 
most efficient mechanism. In addition, as 
with previous Intel designs, this approach 
makes the overall system speed dependent 
on the speed of the front-side bus. Despite 
these drawbacks, the additional CPUs and 
improvements in the Intel Core micro- 
architecture make Intel’s quad-core chips 
the fastest x64-compatible processors avail- 
able today. 


TABLE | Dual-Core Systems from HP, Dell, and IBM 


In contrast, AMD’s 


upcoming Barcelona Vendor System Processor Maximum CPUs Base Price 
mounts four indepen- HP ProLiant DL580 G4 Xeon 4 $6,649 
dent CPUs on one HP ProLiant DI585G2 Opteron 8 $6,999 
$ eS 
die. AMD’ quad-core pel PowerEdge 6850 Xeon 4 $5,148 
chip will utilize the Dell PowerEdge 6950 Opteron 4 $4,679 
Direct Connect Archi- Bm System x3755 Opteron 4 $7,258 
tecture. Barcelona will IBM System x3850 Xeon 4 $8,455 


be built using a 65nm 

process technology and will have versions 
that utilize a 68-, 95-, or 120-watt power 
envelope. This model enables all four cores 
to act independently, leading to more effi- 
cient power consumption because each 
core can adjust its frequency according to 
the workload. 

Among other important enhancements, 
the Barcelona design sports 128-bit floating- 
point processing and a new 2MB L3 cache 
that’s shared by all the processors. Because 
each processor performs more work per 
clock cycle, an estimated 15 percent effi- 
ciency improvement per core results in an 
improvement in processor performance of 
about 40 percent. The AMD quad-design 
is socket-compatible with existing Socket F 
dual-core processors. Consequently, existing 
dual-core systems built with the AMD 
Socket F can be upgraded to quad-core 
by performing a CPU swap and then 
upgrading the BIOS. The scalability of the 
Barcelona should also be greater than that of 

Intels quad-core CPU. Each 


E Quad-Core Intel Xeon X3220 (2.4 GHz) 
et Dual-Core Intel Xeon 3070 (2.67 GHz) 


core on Barcelonas quad- 
core die could theoretically 
be upgraded to a dual-core 
chip in the future, essentially 


enabling a design that incor- 
porates four dual-core CPUs 
on one quad-core die. 


SQL Server and 
Multicore 


Performance Rating 


Benchmark 


Because it’s designed to take 
full advantage of multipro- 
cessor SMP systems, SQL 
Server can utilize all of the 
cores in a multicore system; 
you dont need to make 
any system or configuration 


> > FIGURE 2 Benchmark scores for dual-core and 


quad-core processors 
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changes. In addition, Micro- 
soft doesn’t charge licensing 
premiums for multicore 
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processors. The company charges for SQL 
Server (and all other Microsoft products 
that are licensed by CPU) according to the 
number of sockets rather than the number 
of cores. For example, if you have a 2-way 
system that’s running single-core proces- 
sors, you need to purchase a license for two 
processors. But if you later upgrade that 2- 
way system to two dual-core processors, no 
change in licensing is required because the 
number of motherboard sockets doesn’t 
change. 

The best thing about hardware com- 
petition is the price and performance 
benefits it brings to customers. Intel and 
AMD’s multicore duel brings those ben- 
efits in spades by delivering SMP power 
at single-CPU prices. Table 1 shows a few 
representative server offerings from HP, 
Dell, and IBM for dual-core systems, any 
of which would work well for running 
SQL Server. 


The Future Is Multicore 
Intel announced its next line of multi- 
core chips, code-named Penryn, last 
fall and expects to make those products 
available later this year. The Penryn line 
of processors will utilize a new 45nm 
manufacturing technology, enabling Intel 
to increase processing speed while simul- 
taneously reducing power requirements 
and heat generation. The move to 45nm 
manufacturing will give Intel a temporary 
leg up on AMD in the game of processor 
leapfrog, but AMD plans to bound back 
with its own line of 45nm chips for 2008. 
Look for Intel’s next big move in late 
2008 with its rumored eight-core pro- 
cessor, code-named Dunnington. SOL 
InstantDoc ID 95995 
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Got a great new product? 
Send announcements to products@sqlmag.com. 


New Products 


BUSINESS INTELLIGENCE 


Blake Eno (products@sqlmag.com) is product editor for Windows IT Pro and SQL Server Magazine. 


SoftArtisans’ Office Writer 3.8 lets you use Microsoft Excel and Microsoft Word to design 
and deliver SQL Server Reporting Services (SSRS) reports (.rdl files). OfficeWriter 3.8 
adds support for the ADOMD.NET DataReader class as a data source and improves 
support for Microsoft Office 2007. Also new is the ability to copy and paste values, for- 
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CONCEPTUAL MODELING TOOL 


Identify Relationships 
Between Data and 
Processes 


Embarcadero Technologies has released 
its new business-process and conceptual 
modeling tool, EA/Studio Business Mod- 
eler Edition. EA/Studio reveals the inter- 
relationships between data architectures, 
database applications, and business processes 
so that you can understand the effect 
of changing any component before you 
implement the change. EA/Studio supports 
importing models created in MicrosoftVisio. 
Pricing for EA/Studio starts at $495 per 
seat. For details, contact Embarcadero Tech- 
nologies at 415-834-3131 or http://www 


.embarcadero.com. 


mats, and formulas between 
Excel spreadsheets; improved 
image insertion in Word 
documents within Micro- 
soft .NET Framework 2.0; 
and quicker spreadsheet 
generation when importing 
certain data types with an 
ExcelTemplate object. A 
So free trial version of Office- 
Writer 3.8 is available. For 
more information, contact 
SoftArtisans at 877-763- 


8278 or info@softartisans 
com or go to http://www 


_.softartisans.com. 
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DATABASE AUTOMATION 


GridApp Systems announced a new ver- 
sion of its Clarity database automation and 
management software. GridApp Clarity 
3.5 offers improved patch management, 
including support for SQL Server and clus- 
tered environments; supports patching and 
patch rollback for non-Clarity—provisioned 
databases, and includes provisioning and 
patching of Oracle and SQL Server databases. 
The new version also features customizable 
reports for the configuration management 
database and a central interface for tracking 
and managing configurations across all data- 
bases, nodes, and clusters in a heterogeneous 


environment. Supported platforms include 
SQL Server, Oracle, IBM DB2, and MySQL. 
For more information, contact GridApp 
Systems at 646-452-4100 or 408-573-6305 


or see_http://www.gridapp.com. 


BUSINESS INTELLIGENCE 


Create Reports in Microsoft 
Office Word 2007 


iT-Workplace’s Intelligencia for Word 2007, 
an add-in for Microsoft Office Word 2007, 
lets users create and distribute BI reports 
within Word using data from relational and 
OLAP data sources. To create a report, users 
select a data source, build the query by 
using a visual query interface, and format 
the results. Intelligencia can combine data 
from different sources without program- 
ming and lets users join multiple tables 
and sort, group, and filter results. Users can 
create reports that incorporate cube data and 
metadata by using OLAP functions that are 
updated automatically as the underlying data 
changes. For more information, contact iT- 


Workplace at sales@it-workplace.com or go 
to _http://www.it-workplace.com. 


TECHNICAL RESOURCE: 
DEVELOPMENT 


Become a Better 
SOL Server Developer 


Mike Murach and Associates has pub- 
lished Murachs SQL Server 2005 for Devel- 
opers, a book that aims to boost SQL skills by 
revealing features of SQL Server that many 
developers don’t know about. In addition to 
queries, cursors, views, and stored procedures, 
topics include using enhanced SQL features 
for working with XML data, SQL Server 
Management Studio, and the SQL Server 
2005 CLR integration feature that lets devel- 
opers create database objects in Microsoft 
.NET languages. Murachs SQL Server 2005 


for Developers lists for $52.50. For more infor- 


mation, contact the publisher at 800-221-5528 


or go to _http://www.murach.com. SOL 
InstantDoc ID 96189 


www.sqimag.com SQL Server Magazine July 2007 43 


Ordering the SQL Master CD is like 
pocketing a team of SQL experts. 


Packed with thousands of articles, bonus 
content, and loads of expert advice— 
getting the SQL Master CD is like 
pocketing your very own team of 
professional SQL consultants. 


And at a fraction of the cost. Ma o 5 F e f 
Search for articles by keyword, subject, ie CO 
in lightning-fast time—order the ae 
SQL Master CD today. ElServer 
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Insights from the SQL Server industry 


Industry Bytes 


A Holistic View of 
High Availability 
hat does it take to have high-availability 
databases? According to John Posavatz, 
vice president of product management for 
Neverfail Group (http://www.neverfailgroup 
.com), it’s not just a matter of setting up a 
high-availability server, or even of moni- 
toring one for the common problems you 
might expect. It’s often the problems you 
don’t expect—those that crop up as your 
server environment changes over time—that 
cause the most damage. “Very rarely does 
a server simply fail,’ explained Posavatz. 
“When a server goes down, it’s often because 
someone made a simple mistake in configu- 
ration or added software without realizing all 
the ways it could affect the server.” 
For example, a common cause of 
database corruption in SQL Server f= 
environments is antivirus software. 
When file-level antivirus software 
performs its scans, it can detect 
anomalies and quarantine critical 
database files, thus corrupting the 
entire database. 
Neverfail Group realized that 


preventing this kind of problem [=== 


is at the core of making sure data > 
remains highly available. The com- 
pany has released the latest version 
of its Server Check Optimization 
Performance Evaluation (SCOPE) 
tool, an analysis tool that works with the 
Neverfail high-availability server products, 
including Neverfail for SQL Server. The 
SCOPE tool helps you prevent server 
problems in several ways. First, before you 
install Neverfail, you can use SCOPE to 
scan your servers, assess their reliability, and 
eliminate potential problems before you 
start. After Neverfail is installed and running 
on your system, SCOPE continually assesses 
the ongoing health of your servers and 
proactively addresses any potential reliability 
problems.The tool benefits from an ongoing 
connection with Neverfail’ rule repository, 
which collects information about problems 


www.saimag.com 
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Solve Your SSIS ETL Headaches 


| f you're working with SQL Server Integration Services (SSIS), you might already 

have discovered one of the problems with its extraction, transformation, and 
loading (ETL) process: SSIS doesn’t provide an easy way to extract, transform, or 
load data from unstructured or semi-structured data sources such as Microsoft Excel 
spreadsheets and reports, raw text files, Oracle databases, and ODBC data sources. 

According to Vassil Kovatchev, CTO of Interactive Edge (http://www 
.interactiveedge.com), the current SSIS solution for bringing unstructured data into 
the data flow is to write hard-coded custom scripts—a time-consuming manual process. 
Interactive Edge provides a more satisfactory solution with its new Visual Studio plug-in, 
DataDefractor. In a recent conversation with our editors, Kovatchev discussed the example 
of a real-estate report in Excel that displayed years in columns and months in rows. 
Kovatchev explained, “Writing a script to transform a report like this would typically take 
about five days. With DataDefractor, the transformation takes about ten minutes.” Figure 
1 shows the way in which DataDefractor takes apart and stores data from an Excel sheet. 
That savings in time is enough to make most database professionals sit up and take 
notice. The DataDefractor tool is 
a custom SSIS data-source-flow 
component that’s fact-oriented 
and rules-based. The wizard- 
like interface lets you customize 
dimensions and measures to 
quickly transform unstructured or 
semi-structured data into normal- 
ized, usable data. How did Micro- 
soft overlook the need for this 


FIGURE | DataDefractor’s Data Flow view 


illustrating how data from an Excel document is distilled 
to SQL databases 


kind of component in SSIS? As 
Kovatchev explained, Microsoft is 
platform-oriented and is happy to 
rely on ISVs to fill in gaps on the 
platforms it creates. Companies 
such as Interactive Edge can then find opportunities to provide useful tools for making 
the life of a database pro easier. DataDefractor, which is currently in beta, was officially 
released March 16, 2007. SOL! 
—Dawn Cyr 
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that other users encounter and incorporates 
the new information into its scans. 

It’s these preemptive warnings that make 
Neverfail’s approach unique, says Posavatz. 
“Almost all of Neverfail’s customers make 
backups, but if you're to the point that 
youre restoring data from a tape backup, 
then something has gone horribly wrong.” 
Neverfail’s approach is to look at not just 


SQL Server Magazine 


your server setup, but at the composition of 
your entire system to see how all the pieces 
in your environment affect your server and 
prevent potential problems. “Unless you're 
taking a holistic view of your system,” says 
Posavatz, “you're likely to miss the one thing 
that can bring your servers down.” SOL 

—Dawn Cyr 

InstantDoc ID 95499 


July 2007 45 


www.sqilmag.com 


® 
O 
AY 
Oo 
had 
® 
xX 
= 
4) 
= 


July 2007 


Connecting readers with the oe] TEEN 


; INTE Foseaassare enasnanone E ese osL acoso ners cideeoan soto SERBS e REG SERGEGe abana Rem DRnONINS 4 
products and services www.altova.com 
they need most. PON SO se te E E EREE EAA E EET 621 
www.apexsql.com 
AMEN E reer cet N TA ets oa Sate E EA A E N ols 47 


s www.appdev.com 
SOL SERVER Büsiness:Objects: a ee ea il 


www.businessobjects.com/rareoccurrence 


a E O E O O A A 15 
M a rket Hace www.dell.com/sq| 
A SPECIAL MONTHLY Dundas;Softwares lit aaee EE eee co reer eR AAA REE NEEE TEER re EEE een 32 
www.dundas.com 
ADVERTISING SECTION AUP: | er N a e ARE 2, 16b 
www.embarcadero.com 
First Advantage Data ReCOvery.................00.cc:ccccccceeceeeeceeeeeeeeeeeeeeeeeseeeeeseeaeenees 46 
www.datarecovery.net 
Fujitsu Computer Systems Corporation ................0..0:ccccccecceeseeeeeeeeteeeeeeeeeteeees 18 
www.us.fujitsu.com/computers/reliability2 
IBMGorporation: 2.2 a ee Cover 2, 1 
www.ibm.com/takebackcontrol/info 
NGA Fees races cca S E T E E A eatas be Cover 4 
www.idera.com 
If you would like your products Interactive Edge LLG sc oie e aa E E A A E 8 
and services featured among rere a tadeffactorcom 
Neee a E tert suey necaumenirasaas teat mneneerreeanecer Cover 3 
these pages, call: www.sqlsentry.net 
. Melissa Data Gorporation -aaee e e E e e 47 
Key Accounts Director www.melissadata.com 
Richard Resnick SQLServer: Connections: erreen a AE E cnet deenroncuan satires 22 
800-949-4007 www.DevConnections.com 
rresnick@sqimag.com SAL Server MAR AZING IIe a ese a A 26, 40, 44 
www.sqlmag.com 
Additional advertising information is also available Windows IT POO ro e E E eine aioumceareanuncuumekes 47 


at www.sqimag.com 


FIRST Advantage 
SQL DATABASE DISASTER RECOVERY 


Even a well configured, fault tolerant MSSQL Server can fail! 


» I/O errors 

+ “Suspect” mode 

» Deleted or corrupted log file 

» Deleted data (tables, records, system objects) 
» Corruption caused by RAID failure 

» Corrupted backup file 

» Torn pages 


IT IS POSSIBLE TO RECOVER! 
In most cases it is possible to repair the database to an attachable state. If we are unable to 
repair the database to a point where it will attach, we will recover as many tables and 
records as possible with our specialized software tools. This recovered data can then be 
merged back into an empty database that your front end application will work with. 


Call us at 877.304.7189 or e-mail mssqi@datarecovery.net 


“The ability to maintain 
clean, reliable marketing Now you can manage your 
data across multiple capture 


mediums is essential for Windows IT Pro accounts ONLINE 


everything we do.” 
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Your Data Superstore To log on, you will need your customer number from an 
invoice or your magazine’s mailing label. 


Learn Microsoft® SQL Server” 2005 
and Business Intelligence (Bl) 


Introducing the latest in SQL Server 2005 and Business Intelligence (BI) courses from AppDev, the nation’s leader in developer 
learning. Our nationally recognized industry experts will walk you step-by-step through the features and functionalities of these 
exciting SOL Server 2005 technologies! 
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Microsoft SQL Server 2005 20 CD-ROMs or 2 DVD-ROMs 


Microsoft SQL Server 2005 Reporting Services (SSRS) 8 CD-ROMs or | DVD-ROM 
Microsoft SQL Server 2005 Analysis Services (SSAS) 8 CD-ROMs or 1 DVD-ROM 


Microsoft SQL Server 2005 Integration Services (SSIS) 8 CD-ROMs or 1 DVD-ROM 
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BUY 1, GET 1 


For a limited time, purchase one of our SOL Server 2005 courses above, get another SQL Server 2005 course 
(of equal or lesser value) FREE! Or get all four SQL Server 2005 courses in one money-saving learning suite! 


Visit our Web site today for offer details, plus course outlines and 
AppDev Expert Andy Baron more information about our new SQL Server 2005 courses. 
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Same great training, now for your entire team—KSource Online Learning™ * www.ksourceit.com 


by Michael Otey 


system Center Data Protection 


Manager 2007 


Ge Center Data Protection Manager (DPM) 2007 is Microsoft’s latest continuous data 
protection (CDP) product, and it’s particularly important to SQL Server users for how it 
differs from DPM 2006. DPM 2006 was just a file system backup product and wasn’t able to 
back up SQL Server databases. DPM 2007 changes all that. (Although it’s currently in beta, 
you can find out more about it at_http://microsoft.com/systemcenter/dpm/default.mspx.) 
Like DPM 2006, DPM 2007 provides disk-based CDP and enables quick data recovery from 
disk-based backups. However, DPM 2007 is also fully integrated with SQL Server for database 


backup and restore. Here are five reasons you might want to take a look at DPM 2007. 


Support for Multiple Servers 
Most businesses, even small-to-midsized 
businesses (SMBs), have multiple server 
systems to back up. DPM 2007 can back up 
not only SQL Server 2000 and later, but also 
Windows Server 2003 and later (including 
Longhorn Server), Windows Storage Server, 
Windows SharePoint Services 2.0 and later, 
Microsoft Exchange Server 2003 and later, 
Windows Vista, and Windows XP. 


Restores to Original Server or 
Alternate Server 

Another important disaster recovery feature 

DPM 2007 offers is the ability for admin- 
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istrators to restore a database to its original 
server and location, to its original server as 
a new database, or to an alternate recovery 
server. DPM 2007 also supports Automated 
System Recovery tools for bare-metal 
restores. 


64-Bił Support 
Following the trend set by other Microsoft 
server products, DPM 2007 fully supports 
64-bit architecture, which gives DPM the 
ability to address more memory and offer 
greater scalability. However, one point to 


note is that the 64-bit support is x64 only: 


There is no Itanium (IA-64) support. 
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Cluster-Aware 

Although the DPM 2007 server itself can’t 
be clustered, DPM 2007 is cluster-aware. 
When a DPM 2007 agent is installed on a 
cluster, DPM detects the cluster nodes and 
identity as well as the virtual server running 
on the cluster. In the event of a failover, 
DPM 2007 automatically protects the vir- 
tual SQL Server. 


SQL Server Integration 
DPM 2007 uses block-level synchronization 
and the SQL Server Volume Shadow Copy 
Services (VSS) writer along with transaction 
log synchronization to protect SQL Server 
databases. DPM 2007 makes an initial copy 
of the protected databases, then transaction 
logs are synchronized with the DPM server 
on a regular basis. The SQL Server VSS 
writer sends only the updated file blocks 
from the protected databases. DPM 2007 
can store up to 512 shadow copies of a SQL 
Server database. SOL 
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Unprecedented Visibility and Control over your Enterprise 
SQL Sentry Event Manager is the ultimate scheduling, alerting and response system for optimizing 
schedule performance of database servers and related IT resources. Event Manager provides DBAs 
with unparalleled capabilities for managing SQL Agent jobs, Windows Tasks, and Oracle jobs in 
increasingly complex cross-platform environments 


Key Features: 


Key Benefits: 

> Visual Schedule Management > Easy to install and use 

> Alerting and Response System > Distributed “Agent-less" deployment 
> SSIS and DTS Support > 100% .NET based application 

> Cross-platform Support * Lower database administration costs 
> Chaining and Queuing > Reduce down time 

> Schedule Performance Monitoring 


> Improve application performance 


Free Trial Download at; www.sqlsentry.net | 
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TOOLS FOR DATA MANAGEMENT 


Microsoft 
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ner } Partner 


Real Life 


SOL Server Superhero 


| STAND TALL. Idera made me a real life SOL Server Superhero. 


Idera delivers a new generation 

of tools for managing the 

world’s fastest growing database 
management system — Microsoft 
SOL Server. Battle-proven and 
engineered for the enterprise, Idera 
helps database administrators keep 
SOL Server running at optimum 
performance, ensure availability, 


speed recovery, ease compliance 
requirements, and dramatically 
reduce administrative overhead. 

All of Idera’s products are amazingly 
simple to use, provide remarkable 
results, and can be installed in 
minutes, configured in hours, and 
deployed worldwide in days. 


